A place where I share my thoughts, findings and learnings

HOME
CATEGORIES
TAGS
ARCHIVES
ABOUT

Home Tags

Tags

Tags

Data Parallelism1

Differential Transformer1

Multi Latent Attention1

Pipeline Parallelism1

Tensor Parallelism1

Recently Updated

Understanding multi GPU Parallelism paradigms
Attention and Transformer Imagined
Rethink LoRA initialisations for faster convergence
Transformer showdown MHA vs MLA vs nGPT vs Differential Transformer

Trending Tags

Attention FFNN Transformer activations Data Parallelism Differential Transformer Fine tuning GPU GQA kv cache

© 2025 Datta Nimmaturi. Some rights reserved.

Using the Chirpy theme for Jekyll.

Trending Tags

Attention FFNN Transformer activations Data Parallelism Differential Transformer Fine tuning GPU GQA kv cache