Training 5 Systems for LLM RL May 30, 2026 The lore behind LoRA Mar 23, 2026 Understanding multi-GPU Parallelism paradigms Jul 6, 2025 Transformer showdown MHA vs MLA vs nGPT vs Differential Transformer Jan 22, 2025 Rethink LoRA initializations for faster convergence Jun 7, 2024