Skip to main content

Workloads

A workload is a JSONL file that drives the simulator's request queue. Each line is either one independent request (flat format) or one session with chained sub-requests (agentic format). Both python -m serving --dataset and python -m bench run --dataset consume the same file.

Workload files live under workloads/ at the repo root. The simulator reads them once at startup and the router schedules each request as its arrival_time_ns is reached on the simulated clock.

Pick your path

Bundled workload files

The repo ships a few ready-to-run workload files for the supported hardware × model combinations. Drop them straight into --dataset workloads/<file>.jsonl:

FileFormatModel assumptionWhat it's for
example_trace.jsonlflatanyTiny smoke-test workload for the quickstart
sharegpt-llama-3.1-8b-300-sps10.jsonlflatmeta-llama/Llama-3.1-8BShareGPT, 300 requests at 10 sessions/s
sharegpt-qwen3-32b-300-sps10.jsonlflatQwen/Qwen3-32BShareGPT, dense Qwen3
sharegpt-qwen3-30b-a3b-300-sps10.jsonlflatQwen/Qwen3-30B-A3B-Instruct-2507ShareGPT, MoE Qwen3
swe-bench-qwen3-30b-a3b-50-sps0.2.jsonlagenticQwen/Qwen3-30B-A3B-Instruct-250750 SWE-bench sessions, low arrival rate

Token IDs in these files are pre-tokenized with the matching model's tokenizer, so prefix caching works correctly out of the box. Using a ShareGPT JSONL with a different model in the simulator is fine for length/arrival behavior, but prefix-cache hit rates won't match reality.

How workloads connect to the simulator

workloads/foo.jsonl


Router.load_requests()


_pending_requests (sorted by arrival_time_ns)
│ (clock advances, arrivals fire)

route_arrived_requests() → Scheduler queue


scheduler.schedule() → Batch → trace → ASTRA-Sim

For the full request journey, see Simulator → Request lifecycle.

What's next