Archives 2026 23 Mar The lore behind LoRA 24 Feb Exploring the Mixture of Experts2025 06 Jul Understanding multi GPU Parallelism paradigms 14 Jun Attention and Transformer Imagined 22 Jan Transformer showdown MHA vs MLA vs nGPT vs Differential Transformer2024 07 Jun Rethink LoRA initialisations for faster convergence