Welcome to LLMServingSim

LLMServingSim is your sandbox for heterogeneous and disaggregated LLM serving infrastructure. Want to model a brand-new GPU? Sweep parallelism strategies? Throw exotic memory tiers (CXL, PIM) at a workload? Try a 32-GPU cluster you don't have? It's all within reach.

Setup is genuinely quick, clone, launch a container, compile, run. ~10 minutes. Once you're in, the simulator gets out of your way.

What you can do

Any topology, instantly

TP / PP / EP / DP+EP across multiple instances. Chunked prefill, prefix caching, MoE expert routing, KV-cache offloading, mix and match in a config file. No code changes.

Any hardware, profiled

Plug in a new GPU, CXL tier, or PIM device. The vLLM-based layerwise profiler captures real CUDA timings and feeds them straight into the simulator.

Validated against vLLM

Sub-3% error end-to-end on TTFT, TPOT, and throughput. The numbers really do reflect what production serving delivers.

Wild scenarios welcome

ShareGPT traces, agentic sessions, 10× clusters, unreleased GPUs, exotic memory tiers, run experiments you cannot easily run on real hardware.

Prerequisites at a glance

A Linux host with Docker is all you need to get started.

Required	Optional but recommended
Linux (Ubuntu 22.04+ tested)	NVIDIA GPU + Container Toolkit (only for profiling new HW)
Docker	Hugging Face token (for gated model configs)
Git with submodule support	32 GB RAM (for the vLLM benchmark side)
~12 GB free disk

Already have everything? Jump straight to Simulator setup and run your first sim in 10 minutes.

Need a hand?

Bug or feature request: GitHub Issues
Discussion: GitHub Discussions
Want to add new hardware or models? Profile it yourself with our profiler guide: it's the same flow we use internally.

Welcome to LLMServingSim

What you can do

Any topology, instantly

Any hardware, profiled

Validated against vLLM

Wild scenarios welcome

Three steps to your first simulation

1. Install

2. Run

3. Get unstuck

Prerequisites at a glance

Need a hand?

What you can do​

Any topology, instantly

Any hardware, profiled

Validated against vLLM

Wild scenarios welcome

Three steps to your first simulation​

1. Install

2. Run

3. Get unstuck

Prerequisites at a glance​

Need a hand?​

What you can do

Three steps to your first simulation

Prerequisites at a glance

Need a hand?