Transformer 4 Exploring the Mixture of Experts Feb 24, 2026 Understanding multi GPU Parallelism paradigms Jul 6, 2025 Attention and Transformer Imagined Jun 14, 2025 Transformer showdown MHA vs MLA vs nGPT vs Differential Transformer Jan 22, 2025