A place where I share my thoughts, findings and learnings

HOME
CATEGORIES
TAGS
ARCHIVES
ABOUT

Home Categories LLM

Category

LLM 1

Rethink LoRA initialisations for faster convergence Jun 7, 2024

Recently Updated

Exploring the Mixture of Experts
Understanding multi GPU Parallelism paradigms
Attention and Transformer Imagined
Rethink LoRA initialisations for faster convergence
Transformer showdown MHA vs MLA vs nGPT vs Differential Transformer

Trending Tags

FFNN Transformer Attention Math activations Data Parallelism Differential Transformer Fine tuning GPU GQA

© 2026 Datta Nimmaturi. Some rights reserved.

Using the Chirpy theme for Jekyll.

Trending Tags

FFNN Transformer Attention Math activations Data Parallelism Differential Transformer Fine tuning GPU GQA