About
Hi, I’m Datta Nimmaturi
I work at Unsloth and write about the parts of deep learning that become clearer when you build them from first principles: attention, fine-tuning, GPU kernels, model parallelism, and the math behind LLM systems.
My goal with this blog is to make technical ideas feel inspectable. Most posts start with an intuition, turn it into math or code, and then connect it back to practical systems behavior.
Good starting points:
- Attention and Transformer Imagined for the foundation.
- The MathemaTricks behind FlashAttention for GPU-aware attention.
- Understanding multi-GPU Parallelism paradigms for distributed inference and training.
I studied Mathematics and Computer Science at IIT Hyderabad. Outside work, I spend time on chess, cricket, operating-system tinkering, and deep reinforcement learning.
Reach out via Discord, X/Twitter, LinkedIn, or email.
I also write short ML research-paper summaries on Substack. Here’s a preview:
