Skip to main content

Codebase tour

This page answers "I want to add or change X, where do I touch?". It is a directory map, not a behavior reference. For what each piece does, see the Simulator and Profiler sections.

The five domains

LLMServingSim/
├── serving/ Simulator Python, the core loop
├── profiler/ Profiler Python, vLLM-based latency capture
├── bench/ Bench Python, real vLLM run + sim validation
├── workloads/ Workloads JSONL traces + generators
├── configs/ Configs JSON: cluster / model / PIM
├── scripts/ Env scripts Docker launchers + builders
└── astra-sim/ Backend C++ analytical network simulator

Each domain has a clear boundary. A typical PR touches one or two of these, not all of them. If you find yourself editing four domains for a single change, stop and reconsider the scope.

Simulator (serving/)

Where most contributor work happens.

serving/
├── __main__.py CLI + main loop
└── core/
├── scheduler.py vLLM-style continuous batching
├── trace_generator.py Profile lookup -> text trace
├── memory_model.py KV / weight / CXL byte accounting
├── graph_generator.py Text trace -> Chakra protobuf
├── controller.py ASTRA-Sim subprocess IPC
├── router.py Request routing across instances
├── gate_function.py MoE expert routing
├── config_builder.py Cluster config -> ASTRA-Sim inputs
├── power_model.py Power / energy estimation
├── pim_model.py PIM device model
├── request.py Request / Batch dataclasses
├── radix_tree.py Prefix cache (RadixCache, from SGLang)
├── logger.py Rich-based logging + stdio capture
└── utils.py Model config loading, formatters

Where to touch by intent:

IntentEdit
Change scheduling policyscheduler.py
Change how latency is looked uptrace_generator.py (_lookup_*)
Change byte accounting (KV, weights, prefix cache)memory_model.py
Change inter-instance routingrouter.py
Add a new CLI flag__main__.py (argparse), then thread through
Change MoE expert distributiongate_function.py
Change ASTRA-Sim input generationconfig_builder.py
Add a new power componentpower_model.py

Profiler (profiler/)

profiler/
├── __main__.py CLI dispatch (profile / slice)
├── core/ internals (runner, engine, categories, fit_alpha)
├── models/<model_type>.yaml Architecture catalogs (one per HF model_type)
├── perf/<hw>/<model>/... Output bundles (CSV per category)
└── profile.sh Editable user template

Where to touch by intent:

IntentEdit
Add a new hardware targetRun the profiler with HARDWARE= set; output lands in perf/<hw>/. See Profiler / Adding hardware
Add a new model architectureDrop a YAML in profiler/models/<model_type>.yaml. See Profiler / Adding model architecture
Change the skew alpha fitcore/fit_alpha.py
Change what categories get profiledcore/categories.py + core/runner.py
Change output CSV columnscore/writer.py (and _load_perf_db() in serving/core/trace_generator.py to consume them)

Bench (bench/)

bench/
├── __main__.py CLI (run / validate)
├── core/ AsyncLLM driver, recorder, validator
├── examples/<model>/ Committed end-to-end runs
└── results/<run_id>/ Output for ad-hoc runs

You'll touch this only if you change the validation methodology itself (how vLLM is driven, what metrics are compared, what plots are emitted). For day-to-day "did my change regress?" use, see Validating your changes.

Configs (configs/)

configs/
├── cluster/<name>.json Cluster topology (the main thing)
├── model/<org>/<name>.json Model architecture (subset of HF config.json)
└── pim/<name>.ini PIM device specs (DRAMSim3 format)

Cluster configs are the most edited file outside serving/. Adding a new scenario almost always means dropping a new configs/cluster/<scenario>.json and not touching simulator code at all. Field-by-field schema lives in Reference / Cluster config.

Workloads (workloads/)

workloads/
├── *.jsonl Datasets (one request or session per line)
├── generators/ ShareGPT / SWE-bench JSONL builders
└── README.md JSONL format reference

Adding a new workload generator is a contained change: a new module under generators/, runnable as python -m workloads.generators.<your_module>. See Workloads / ShareGPT generators for the existing pattern.

ASTRA-Sim (astra-sim/)

C++ network simulator, lives as a submodule. Don't edit unless the change targets simulator integration. Most simulator-side changes never touch this.

The handful of files you might edit:

FileWhy
astra-sim/extern/graph_frontend/chakra/src/converter/llm_converter.pyNew trace comm_type syntax, new memory location enum
astra-sim/astra-sim/system/Workload.ccCustom collective issuance, involved_dim handling
astra-sim/astra-sim/system/AstraMemoryAPI.hhNew memory tier enum (paired with llm_converter.py)
astra-sim/inputs/...Don't edit. Generated by config_builder.py on every run

If you do edit ASTRA-Sim, rerun ./scripts/compile.sh before testing.

Scripts (scripts/)

scripts/
├── docker-sim.sh Sim container launcher
├── docker-vllm.sh vLLM container launcher (profiler / bench)
├── install-vllm.sh Bare-metal vLLM install (uv venv)
└── compile.sh ASTRA-Sim + Chakra build

You'll touch these rarely. If you add a new entry point, prefer python -m <module> (handled inside the existing containers) over adding more shell scripts.

Tests and fixtures

There is no unit-test suite. Validation is done by:

  1. Running a smoke python -m serving … and inspecting the output CSV.
  2. Running python -m bench validate against a known-good vLLM replay (see Validating your changes).

When adding a feature that has clean inputs and outputs (a new _lookup_* function, a new memory accounting helper), feel free to add a script under scripts/ or a notebook checked into your branch. The project has not adopted a formal test framework yet; that itself is an open contribution opportunity.

Where docs live

AudienceLocation
User-facing docs (this site)docs/
Per-module developer notes<module>/README.md (each top-level Python module has one)
Top-level project READMEREADME.md
Project context for AI agentsCLAUDE.md (mirrors AGENTS.md)

When you change behavior, update the relevant page under docs/. When you add a feature, also update the module's own README.md if it covers something the website doesn't.

What's next