Transformer showdown MHA vs MLA vs nGPT vs Differential TransformerComparing various transformer architectures like MHA, GQA, Multi Latent Attention, nGPT, Differential Transformer. Jan 22, 2025 Transformer, Architectures
Rethink LoRA initialisations for faster convergenceA better initialisation for LoRA to make convergence faster Jun 7, 2024 LoRA, Fine tuning, LLM