Archives 2025 22 Jan Transformer showdown MHA vs MLA vs nGPT vs Differential Transformer2024 07 Jun Rethink LoRA initialisations for faster convergence