Workloads

A workload is a JSONL file that drives the simulator's request queue. Each line is either one independent request (flat format) or one session with chained sub-requests (agentic format). Both python -m serving --dataset and python -m bench run --dataset consume the same file.

Workload files live under workloads/ at the repo root. The simulator reads them once at startup and the router schedules each request as its arrival_time_ns is reached on the simulated clock.

Pick your path

Real-world traffic

Generate JSONL from ShareGPT (or any HF text dataset). Realistic prompt/output length distributions, configurable arrival rate.

Agentic / closed-loop

Multi-step LLM calls with tool waits in between (SWE-bench, ReAct, browser agents). Dependency chains modeled by the simulator.

Synthetic / custom

Hand-craft your own JSONL, fixed-length stress tests, replay your own production logs, or anything in between.

Bundled workload files

The repo ships a few ready-to-run workload files for the supported hardware × model combinations. Drop them straight into --dataset workloads/<file>.jsonl:

File	Format	Model assumption	What it's for
`example_trace.jsonl`	flat	any	Tiny smoke-test workload for the quickstart
`sharegpt-llama-3.1-8b-300-sps10.jsonl`	flat	`meta-llama/Llama-3.1-8B`	ShareGPT, 300 requests at 10 sessions/s
`sharegpt-qwen3-32b-300-sps10.jsonl`	flat	`Qwen/Qwen3-32B`	ShareGPT, dense Qwen3
`sharegpt-qwen3-30b-a3b-300-sps10.jsonl`	flat	`Qwen/Qwen3-30B-A3B-Instruct-2507`	ShareGPT, MoE Qwen3
`swe-bench-qwen3-30b-a3b-50-sps0.2.jsonl`	agentic	`Qwen/Qwen3-30B-A3B-Instruct-2507`	50 SWE-bench sessions, low arrival rate

Token IDs in these files are pre-tokenized with the matching model's tokenizer, so prefix caching works correctly out of the box. Using a ShareGPT JSONL with a different model in the simulator is fine for length/arrival behavior, but prefix-cache hit rates won't match reality.

How workloads connect to the simulator

workloads/foo.jsonl
        │
        ▼
   Router.load_requests()
        │
        ▼
   _pending_requests (sorted by arrival_time_ns)
        │  (clock advances, arrivals fire)
        ▼
   route_arrived_requests() → Scheduler queue
        │
        ▼
   scheduler.schedule() → Batch → trace → ASTRA-Sim

For the full request journey, see Simulator → Request lifecycle.

What's next

JSONL format: schema reference for both formats, field-by-field.
ShareGPT generator: produce realistic workloads from ShareGPT or any HF dataset.
Agentic sessions: closed-loop session format and the SWE-bench example.

Pick your path​