Infrastructure Sizing · DGov AI Platform

GPU Server Capacity Planner

Bandwidth-aware sizing for LLM serving. Decode speed tracks memory bandwidth, not TFLOPS — the panel sizes VRAM, throughput, and GPU count live.

ENGINE  vLLM · memory-bound roofline
GPU  
MODEL  calib · RTX 6000 / Qwen3-A3B
Summary copied