Archives 2026 21 May Reinforcement Learning for LLMs 12 Apr The MathemaTricks behind FlashAttention 23 Mar The lore behind LoRA 24 Feb Exploring the Mixture of Experts2025 06 Jul Understanding multi-GPU Parallelism paradigms 14 Jun Attention and Transformer Imagined 22 Jan Transformer showdown MHA vs MLA vs nGPT vs Differential Transformer2024 07 Jun Rethink LoRA initializations for faster convergence