A place where I share my thoughts, findings and learnings

HOME
CATEGORIES
TAGS
ARCHIVES
ABOUT

Home Archives

Archives

Archives

2026

23 Mar The lore behind LoRA
24 Feb Exploring the Mixture of Experts

2025

06 Jul Understanding multi GPU Parallelism paradigms
14 Jun Attention and Transformer Imagined
22 Jan Transformer showdown MHA vs MLA vs nGPT vs Differential Transformer

2024

07 Jun Rethink LoRA initialisations for faster convergence

Recently Updated

The lore behind LoRA
Rethink LoRA initialisations for faster convergence
Exploring the Mixture of Experts
Understanding multi GPU Parallelism paradigms
Attention and Transformer Imagined

Trending Tags

Transformer FFNN Math Attention LoRA activations Data Parallelism Differential Transformer Fine tuning Finetuning

© 2026 Datta Nimmaturi. Some rights reserved.

Using the Chirpy theme for Jekyll.

Trending Tags

Transformer FFNN Math Attention LoRA activations Data Parallelism Differential Transformer Fine tuning Finetuning