Infrastructure Sizing · DGov AI Platform

GPU Server Capacity Planner

Bandwidth-aware sizing for LLM serving. Decode speed tracks memory bandwidth, not TFLOPS — the panel sizes VRAM, throughput, and GPU count live.

ENGINE vLLM · memory-bound roofline

GPU —

MODEL calib · RTX 6000 / Qwen3-A3B