Validating your changes
The project does not (yet) ship a unit-test suite. Validation is done by running the simulator against known scenarios and comparing results. This page covers the three checks you should run before opening a PR, in increasing order of cost.
1. Smoke run (every PR, ~30 seconds)
The minimum bar: run the smallest bundled scenario and confirm it still finishes without errors.
python -m serving \
--cluster-config configs/cluster/single_node_single_instance.json \
--dataset workloads/example_trace.jsonl \
--output outputs/smoke.csv \
--num-reqs 10
What to check:
- Exit code is 0.
outputs/smoke.csvhas 10 rows plus a header.- The throughput log line at the end shows non-zero
prompt_tanddecode_t.
If your change is in serving/, this is the floor. Don't push a
commit that breaks the smoke run.
2. Targeted scenarios (when the change touches related features)
Map your edit to the scenario(s) that exercise it. The bundled cluster configs cover the major features:
| If you touched... | Run scenario |
|---|---|
scheduler.py (any path) | single_node_single_instance.json |
| Prefix caching, RadixCache | single_node_multi_instance.json with --enable-prefix-sharing |
| KV cache, eviction, memory model | single_node_memory_instance.json |
| Multi-instance routing | single_node_multi_instance.json |
| Prefill / decode disaggregation | single_node_pd_instance.json |
| MoE, expert parallelism | single_node_moe_single_instance.json |
| DP+EP wave sync | single_node_moe_dp_ep_instance.json |
| CXL placement | single_node_cxl_instance.json |
| PIM offload | single_node_pim_instance.json |
| Power model | single_node_power_instance.json |
| Trace generator, graph generator | any of the above |
serving/run.sh contains ready-to-run commands for all of these.
Pick the relevant ones and confirm they still produce sensible
output.
3. Bench validation (changes that affect end-to-end accuracy)
If your change could move the simulator's output relative to real
vLLM (anything in scheduler.py, trace_generator.py,
memory_model.py, profile lookup, MoE accounting), run a bench
validation against a committed reference run.
The bench module captures a real vLLM execution, then compares the simulator's output for the same dataset:
# 1. Rerun the sim side of an existing example
./bench/examples/run.sh Llama-3.1-8B
# 2. Compare against the committed vLLM reference
./bench/examples/validate.sh Llama-3.1-8B
Output lands in bench/examples/Llama-3.1-8B/validation/:
summary.txt: aggregate error on TTFT / TPOT / throughput.- A handful of PDFs: per-request latency CDF, throughput timeline, running-waiting curves.
The committed reference baselines target sub-3% error on TTFT, TPOT, and throughput. A regression beyond ~5% is a blocker. Smaller movements need an explanation in the PR description (e.g., "this fixes an under-counting bug; the new error is closer to ground truth than the old").
For deeper detail on the validation methodology, see
bench/README.md.
4. Profiler-side changes (if you touched profiler/)
Profiler changes don't show up in the simulator until you regenerate the perf bundle. Run a small profile to confirm your edit doesn't break the pipeline:
# Inside the vLLM container
MODEL=meta-llama/Llama-3.1-8B HARDWARE=RTXPRO6000 \
./profiler/profile.sh
Then verify the simulator still loads it cleanly with the smoke run from step 1.
If you only changed the alpha fit (fit_alpha.py), you can use
SKIP_DENSE=1 SKIP_PER_SEQUENCE=1 SKIP_ATTENTION=1 SKIP_MOE=1 ONLY_SKEW=1 ./profiler/profile.sh to refresh just skew_fit.csv
without rerunning the rest.
What "this should reproduce" looks like in a PR
In your PR description, include the exact command you ran and the key number from the output. Examples:
Validation:
./bench/examples/validate.sh Llama-3.1-8B→ TTFT MAPE 2.1% (was 2.3%), TPOT MAPE 1.7% (unchanged), throughput 1.2% (was 1.4%).
Smoke:
python -m serving --cluster-config single_node_single_instance.json --dataset example_trace.jsonl --num-reqs 10runs cleanly, output CSV has expected 10 rows.
This gives the reviewer something to rerun, and gives you (and future readers of the git log) a record of what was checked.
When the existing scenarios don't cover what you changed
If your contribution adds a feature that no bundled scenario
exercises, add a new bundled scenario as part of the PR. Drop
a configs/cluster/<your_scenario>.json and add the matching line
to serving/run.sh. This makes the feature reproducible for the
next contributor and gives the reviewer something concrete to
exercise.
For features that need a custom workload (a new agentic dataset, a
specific prompt distribution), commit a small JSONL under
workloads/ and reference it from the cluster config example.
Don't commit anything over a few MB.
What's next
- PR workflow: how to package the change up.
- Reading the output: what the per-request CSV columns mean (useful when validating).