Tags activations1 Differential Transformer1 Fine tuning1 GQA1 kv cache1 LLM1 LoRA1 memory1 MHA1 MLA1 Multi Latent Attention1 nanoformer1 nGPT1 trainig1