Oct 09, 202510 min readFlop Calculations for Multihead Attention LayerLLMsTransformersDeepLearning+4 moreRead more
Sep 24, 20258 min readThe Batch Size Story (Episode 1): Exploring the GPU Memory Hierarchygpumemoryoptimization+2 moreRead more
Sep 18, 20258 min readPagedAttention Deep Dive - Nagging Questions - Vol. 1LLMInferencevLLM+3 moreRead more
Mar 22, 202512 min readAn H100 GPU Datacenter — A Simple Guide to Topology and BandwidthPlain‑language explanations and realistic numbers for 8×H100 nodes: NVLink/NVSwitch inside the box, InfiniBand between boxes, plus diagrams and quick checks.H100DGXNVLink+3 moreRead more