Skip to main content

Quickstart

Run your first end-to-end simulation in under a minute.

This walkthrough assumes you've finished Installation → Simulator setup and you're inside the simulator container at /app/LLMServingSim.

Run the example

python -m serving \
--cluster-config 'configs/cluster/single_node_single_instance.json' \
--dtype float16 --block-size 16 \
--dataset 'workloads/example_trace.jsonl' \
--output 'outputs/example_single_run.csv' \
--log-interval 1.0

That's the whole thing. The simulator will:

  1. Load the cluster topology from configs/cluster/single_node_single_instance.json (a single RTXPRO6000 GPU running Llama-3.1-8B at TP=1).
  2. Stream requests from workloads/example_trace.jsonl according to their arrival times.
  3. Step ASTRA-Sim each scheduling iteration to get cycle counts.
  4. Write per-request latency metrics to outputs/example_single_run.csv.

You should see throughput, memory, and power lines printed roughly once per second. After the run finishes:

head outputs/example_single_run.csv

shows the per-request output (request id, prompt and decode tokens, TTFT, TPOT, end-to-end latency, …).

What the flags mean

FlagWhat it does
--cluster-configCluster topology + hardware. Generates ASTRA-Sim input files automatically.
--dtypeModel weight precision (float16, bfloat16, float32, int8). Picks the matching profile bundle.
--block-sizeKV-cache block size in tokens. Default 16.
--datasetJSONL file of requests (or agentic sessions).
--outputWhere to write per-request metrics.
--log-intervalHow often to print the throughput / memory / power summary line (seconds).

The full flag list lives at Reference → CLI flags.

Try a different scenario

serving/run.sh ships a few worked examples, multi-instance, prefill/decode disaggregation, MoE with EP, prefix caching, CXL memory, PIM offload, and sub-batch interleaving:

./serving/run.sh

Each block in that script is self-contained and ready to copy into your own scripts. Browse the cluster configs that drive them:

ls configs/cluster/

What's next