Blog

Latest insights on AI, ML, and GPU programming

Popular Tags

AI CUDA DGX DeepLearning GPU H100 Inference InfiniBand

Flop Calculations for Multihead Attention Layer

LLMs Transformers DeepLearning+4 more

Sep 24, 2025

8 min read

The Batch Size Story (Episode 1): Exploring the GPU Memory Hierarchy

gpu memory optimization+2 more

Sep 18, 2025

8 min read

PagedAttention Deep Dive - Nagging Questions - Vol. 1

LLM Inference vLLM+3 more

Mar 22, 2025

12 min read

An H100 GPU Datacenter — A Simple Guide to Topology and Bandwidth

Plain‑language explanations and realistic numbers for 8×H100 nodes: NVLink/NVSwitch inside the box, InfiniBand between boxes, plus diagrams and quick checks.