Skip to main content

Validating your changes

The project does not (yet) ship a unit-test suite. Validation is done by running the simulator against known scenarios and comparing results. This page covers the three checks you should run before opening a PR, in increasing order of cost.

1. Smoke run (every PR, ~30 seconds)

The minimum bar: run the smallest bundled scenario and confirm it still finishes without errors.

python -m serving \
--cluster-config configs/cluster/single_node_single_instance.json \
--dataset workloads/example_trace.jsonl \
--output outputs/smoke.csv \
--num-reqs 10

What to check:

  • Exit code is 0.
  • outputs/smoke.csv has 10 rows plus a header.
  • The throughput log line at the end shows non-zero prompt_t and decode_t.

If your change is in serving/, this is the floor. Don't push a commit that breaks the smoke run.

Map your edit to the scenario(s) that exercise it. The bundled cluster configs cover the major features:

If you touched...Run scenario
scheduler.py (any path)single_node_single_instance.json
Prefix caching, RadixCachesingle_node_multi_instance.json with --enable-prefix-sharing
KV cache, eviction, memory modelsingle_node_memory_instance.json
Multi-instance routingsingle_node_multi_instance.json
Prefill / decode disaggregationsingle_node_pd_instance.json
MoE, expert parallelismsingle_node_moe_single_instance.json
DP+EP wave syncsingle_node_moe_dp_ep_instance.json
CXL placementsingle_node_cxl_instance.json
PIM offloadsingle_node_pim_instance.json
Power modelsingle_node_power_instance.json
Trace generator, graph generatorany of the above

serving/run.sh contains ready-to-run commands for all of these. Pick the relevant ones and confirm they still produce sensible output.

3. Bench validation (changes that affect end-to-end accuracy)

If your change could move the simulator's output relative to real vLLM (anything in scheduler.py, trace_generator.py, memory_model.py, profile lookup, MoE accounting), run a bench validation against a committed reference run.

The bench module captures a real vLLM execution, then compares the simulator's output for the same dataset:

# 1. Rerun the sim side of an existing example
./bench/examples/run.sh Llama-3.1-8B

# 2. Compare against the committed vLLM reference
./bench/examples/validate.sh Llama-3.1-8B

Output lands in bench/examples/Llama-3.1-8B/validation/:

  • summary.txt: aggregate error on TTFT / TPOT / throughput.
  • A handful of PDFs: per-request latency CDF, throughput timeline, running-waiting curves.

The committed reference baselines target sub-3% error on TTFT, TPOT, and throughput. A regression beyond ~5% is a blocker. Smaller movements need an explanation in the PR description (e.g., "this fixes an under-counting bug; the new error is closer to ground truth than the old").

For deeper detail on the validation methodology, see bench/README.md.

4. Profiler-side changes (if you touched profiler/)

Profiler changes don't show up in the simulator until you regenerate the perf bundle. Run a small profile to confirm your edit doesn't break the pipeline:

# Inside the vLLM container
MODEL=meta-llama/Llama-3.1-8B HARDWARE=RTXPRO6000 \
./profiler/profile.sh

Then verify the simulator still loads it cleanly with the smoke run from step 1.

If you only changed the alpha fit (fit_alpha.py), you can use SKIP_DENSE=1 SKIP_PER_SEQUENCE=1 SKIP_ATTENTION=1 SKIP_MOE=1 ONLY_SKEW=1 ./profiler/profile.sh to refresh just skew_fit.csv without rerunning the rest.

What "this should reproduce" looks like in a PR

In your PR description, include the exact command you ran and the key number from the output. Examples:

Validation: ./bench/examples/validate.sh Llama-3.1-8B → TTFT MAPE 2.1% (was 2.3%), TPOT MAPE 1.7% (unchanged), throughput 1.2% (was 1.4%).

Smoke: python -m serving --cluster-config single_node_single_instance.json --dataset example_trace.jsonl --num-reqs 10 runs cleanly, output CSV has expected 10 rows.

This gives the reviewer something to rerun, and gives you (and future readers of the git log) a record of what was checked.

When the existing scenarios don't cover what you changed

If your contribution adds a feature that no bundled scenario exercises, add a new bundled scenario as part of the PR. Drop a configs/cluster/<your_scenario>.json and add the matching line to serving/run.sh. This makes the feature reproducible for the next contributor and gives the reviewer something concrete to exercise.

For features that need a custom workload (a new agentic dataset, a specific prompt distribution), commit a small JSONL under workloads/ and reference it from the cluster config example. Don't commit anything over a few MB.

What's next