Transformer 3 Understanding multi GPU Parallelism paradigms Jul 6, 2025 Attention and Transformer Imagined Jun 14, 2025 Transformer showdown MHA vs MLA vs nGPT vs Differential Transformer Jan 22, 2025