Datta's Blog

https://datta0.github.io/Datta's BlogA minimal, responsive and feature-rich Jekyll Chirpy theme for technical writing. 2026-02-25T19:26:37+05:30 Datta Nimmaturi https://datta0.github.io/ Jekyll © 2026 Datta Nimmaturi /assets/img/favicons/favicon.ico /assets/img/favicons/favicon-96x96.png Exploring the Mixture of Experts2026-02-24T14:30:00+05:30 2026-02-25T16:28:36+05:30 https://datta0.github.io/posts/exploring-the-moe/ datta0

An intuitive build up to Mixture of Experts

Understanding multi GPU Parallelism paradigms2025-07-06T16:33:31+05:30 2026-02-24T21:37:53+05:30 https://datta0.github.io/posts/understanding-multi-gpu-parallelism-paradigms/ datta0

We’ve been talking about Transformers all this while. But how do we get the most out of our hardware? There are two different paradigms that we can talk about here. One case where your model happily fits on one GPU but you have many GPUs at your disposal and you want to save time by distributing the workload across multiple GPUs. Another case is where your workload doesn’t even fit entirely on ...

Attention and Transformer Imagined2025-06-14T14:30:00+05:30 2025-06-15T20:19:27+05:30 https://datta0.github.io/posts/transformer-imagined/ datta0

An intuitive build up to Attention and Transformer

Transformer showdown MHA vs MLA vs nGPT vs Differential Transformer2025-01-22T20:27:00+05:30 2025-06-14T18:32:02+05:30 https://datta0.github.io/posts/transformer-showdown/ Datta Nimmaturi

Comparing various transformer architectures like MHA, GQA, Multi Latent Attention, nGPT, Differential Transformer.

Rethink LoRA initialisations for faster convergence2024-06-07T08:36:01+05:30 2025-06-14T18:32:02+05:30 https://datta0.github.io/posts/rethink-lora-init/ datta0

A better initialisation for LoRA to make convergence faster