First-principles notes on LLMs, GPU kernels, and model training

HOME
CATEGORIES
TAGS
ARCHIVES
ABOUT

Home Archives

Archives

Archives

2026

30 May Systems for LLM RL
21 May Reinforcement Learning for LLMs
12 Apr The MathemaTricks behind FlashAttention
23 Mar The lore behind LoRA
24 Feb Exploring the Mixture of Experts

2025

06 Jul Understanding multi-GPU Parallelism paradigms
14 Jun Attention and Transformer Imagined
22 Jan Transformer showdown MHA vs MLA vs nGPT vs Differential Transformer

2024

07 Jun Rethink LoRA initializations for faster convergence

Recently Updated

Rethink LoRA initializations for faster convergence
Transformer showdown MHA vs MLA vs nGPT vs Differential Transformer
Attention and Transformer Imagined
Understanding multi-GPU Parallelism paradigms
Exploring the Mixture of Experts

Trending Tags

LLM Math Training Transformer Attention GPU Inference Systems Architecture FFN

© 2026 Datta Nimmaturi. Some rights reserved.

Built with Jekyll. Source.

Trending Tags

LLM Math Training Transformer Attention GPU Inference Systems Architecture FFN