Examples
Worked end-to-end scenarios. Each one ships with a ready cluster config, a run command you can copy, and a short explanation of what's interesting about it. Use them as starting points, most real experiments are 80% one of these and 20% your own tweak.
Start here
Browse by topic
Parallelism
TP, PP, EP, and DP+EP, what each looks like in a config, when to reach for which, and how they compose for MoE.
Disaggregated serving
Multi-instance with router, prefill/decode disaggregation, and PIM attention offload. Model real production layouts and compute disaggregation.
Memory tiers
Prefix caching, CXL extended memory, FP8 KV cache. The extended-memory features.
Advanced
Power modeling and sub-batch interleaving, niche features that pay off in specific scenarios.
Looking for workloads? Building or generating the JSONL files that drive these examples lives in its own section: Workloads: flat ShareGPT traces, agentic sessions, and the generators that produce them.
How each example is laid out
Every example page follows the same template:
- What it demonstrates: one-sentence summary.
- Prerequisites: which install path you need (most just need the simulator container).
- Cluster config: the JSON file, annotated.
- Run command: copy-able, exact CLI invocation.
- Expected output: what the throughput logs and CSV look like.
- What's interesting: the takeaway.
- Related: pointers to similar or follow-on examples.
Quick onboarding flow if you're new: start with Cluster config explained → then Tensor parallel for the simplest non-trivial example → then whichever topic matches your research question.