Examples

Worked end-to-end scenarios. Each one ships with a ready cluster config, a run command you can copy, and a short explanation of what's interesting about it. Use them as starting points, most real experiments are 80% one of these and 20% your own tweak.

Start here

Cluster config explained

The single file that drives every simulation. What each field means, how the parallelism math works, and how to read the configs that the rest of these examples are built on.

Browse by topic

Parallelism

TP, PP, EP, and DP+EP, what each looks like in a config, when to reach for which, and how they compose for MoE.

Disaggregated serving

Multi-instance with router, prefill/decode disaggregation, and PIM attention offload. Model real production layouts and compute disaggregation.

Memory tiers

Prefix caching, CXL extended memory, FP8 KV cache. The extended-memory features.

Advanced

Power modeling and sub-batch interleaving, niche features that pay off in specific scenarios.

Looking for workloads? Building or generating the JSONL files that drive these examples lives in its own section: Workloads: flat ShareGPT traces, agentic sessions, and the generators that produce them.

How each example is laid out

Every example page follows the same template:

What it demonstrates: one-sentence summary.
Prerequisites: which install path you need (most just need the simulator container).
Cluster config: the JSON file, annotated.
Run command: copy-able, exact CLI invocation.
Expected output: what the throughput logs and CSV look like.
What's interesting: the takeaway.
Related: pointers to similar or follow-on examples.

Quick onboarding flow if you're new: start with Cluster config explained → then Tensor parallel for the simplest non-trivial example → then whichever topic matches your research question.

Start here​