Skip to main content

Examples

Worked end-to-end scenarios. Each one ships with a ready cluster config, a run command you can copy, and a short explanation of what's interesting about it. Use them as starting points, most real experiments are 80% one of these and 20% your own tweak.

Start here

Browse by topic

Looking for workloads? Building or generating the JSONL files that drive these examples lives in its own section: Workloads: flat ShareGPT traces, agentic sessions, and the generators that produce them.

How each example is laid out

Every example page follows the same template:

  1. What it demonstrates: one-sentence summary.
  2. Prerequisites: which install path you need (most just need the simulator container).
  3. Cluster config: the JSON file, annotated.
  4. Run command: copy-able, exact CLI invocation.
  5. Expected output: what the throughput logs and CSV look like.
  6. What's interesting: the takeaway.
  7. Related: pointers to similar or follow-on examples.

Quick onboarding flow if you're new: start with Cluster config explained → then Tensor parallel for the simplest non-trivial example → then whichever topic matches your research question.