Quickstart

Run your first end-to-end simulation in under a minute.

This walkthrough assumes you've finished Installation → Simulator setup and you're inside the simulator container at /app/LLMServingSim.

Run the example

python -m serving \
  --cluster-config 'configs/cluster/single_node_single_instance.json' \
  --dtype float16 --block-size 16 \
  --dataset 'workloads/example_trace.jsonl' \
  --output 'outputs/example_single_run.csv' \
  --log-interval 1.0

That's the whole thing. The simulator will:

Load the cluster topology from configs/cluster/single_node_single_instance.json (a single RTXPRO6000 GPU running Llama-3.1-8B at TP=1).
Stream requests from workloads/example_trace.jsonl according to their arrival times.
Step ASTRA-Sim each scheduling iteration to get cycle counts.
Write per-request latency metrics to outputs/example_single_run.csv.

You should see throughput, memory, and power lines printed roughly once per second. After the run finishes:

head outputs/example_single_run.csv

shows the per-request output (request id, prompt and decode tokens, TTFT, TPOT, end-to-end latency, …).

What the flags mean

Flag	What it does
`--cluster-config`	Cluster topology + hardware. Generates ASTRA-Sim input files automatically.
`--dtype`	Model weight precision (`float16`, `bfloat16`, `float32`, `int8`). Picks the matching profile bundle.
`--block-size`	KV-cache block size in tokens. Default `16`.
`--dataset`	JSONL file of requests (or agentic sessions).
`--output`	Where to write per-request metrics.
`--log-interval`	How often to print the throughput / memory / power summary line (seconds).

The full flag list lives at Reference → CLI flags.

Try a different scenario

serving/run.sh ships a few worked examples, multi-instance, prefill/decode disaggregation, MoE with EP, prefix caching, CXL memory, PIM offload, and sub-batch interleaving:

./serving/run.sh

Each block in that script is self-contained and ready to copy into your own scripts. Browse the cluster configs that drive them:

ls configs/cluster/

What's next

Simulator → Architecture overview to understand how the simulator runs internally.
Simulator → Reading the output to understand the metrics in *.csv.
Workloads → JSONL format to drive the simulator with your own traces.
Profiler overview if you want to add new hardware or models.

Run the example​

What the flags mean​

Try a different scenario​

What's next​

Run the example

What the flags mean

Try a different scenario

What's next