Skip to main content

Cluster config schema

Formal field-by-field schema for the JSON file passed via --cluster-config. For a guided walkthrough with examples, see Examples → Cluster config explained. This page is the lookup reference: every field, every type, every default.

File location

Configs live at configs/cluster/<name>.json. The simulator reads the file once at startup and serving/core/config_builder.py generates derived ASTRA-Sim input files (network.yml, system.json, memory_expansion.json).

Top-level

{
"num_nodes": 1,
"link_bw": 16,
"link_latency": 20000,
"nodes": [...],
"cxl_mem": {...}
}
FieldTypeRequiredDefaultDescription
num_nodesintNumber of physical nodes in the cluster
link_bwfloatInter-node link bandwidth in GB/s
link_latencyfloatInter-node link latency in ns
nodesarrayLength must equal num_nodes
cxl_memobjectoptionalabsentCXL memory expansion (see below)

cxl_mem (top-level, optional)

"cxl_mem": {
"mem_size": 1024,
"mem_bw": 60,
"mem_latency": 250,
"num_devices": 4
}
FieldTypeRequiredDescription
mem_sizefloatCapacity per device in GB
mem_bwfloatBandwidth per device in GB/s
mem_latencyfloatAccess latency in ns
num_devicesintNumber of CXL devices (cxl:0 through cxl:N-1)

When present, instances can reference cxl:N in their placement field.

Per-node (nodes[i])

{
"num_instances": 2,
"cpu_mem": {"mem_size": 512, "mem_bw": 256, "mem_latency": 0},
"instances": [...],
"power": {...},
"cpu_mem.pim_config": "DDR4_8GB_3200_pim"
}
FieldTypeRequiredDescription
num_instancesintNumber of serving instances on this node
cpu_memobjectHost CPU memory config (see below)
instancesarrayLength must equal num_instances
powerobjectoptionalPower model config (see below)

cpu_mem

FieldTypeRequiredDescription
mem_sizefloatHost CPU memory capacity in GB
mem_bwfloatCPU memory bandwidth in GB/s
mem_latencyfloatCPU memory latency in ns
pim_configstringoptionalName of a PIM device config in configs/pim/. See PIM config

power (optional)

Enables the power model on this node. See Examples → Power modeling for the full schema. Top-level structure:

"power": {
"base_node_power": 60,
"npu": {"<hardware>": {...}},
"cpu": {...},
"dram": {...},
"link": {...},
"nic": {...},
"storage": {...}
}
Sub-fieldRequiredDescription
base_node_powerAlways-on host platform power in W
npu.<hardware>.idle_powerNPU idle wattage
npu.<hardware>.standby_powerNPU post-compute standby wattage
npu.<hardware>.active_powerNPU active compute wattage
npu.<hardware>.standby_durationTime to stay in standby after compute, in ns
cpu.idle_power, cpu.active_power, cpu.utilCPU baseline + utilization fraction
dram.dimm_size, dram.idle_power, dram.energy_per_bitDIMM size, idle power, per-bit energy
link.num_links, link.idle_power, link.energy_per_bitNetwork link power
nic.num_nics, nic.idle_powerNIC count and baseline
storage.num_devices, storage.idle_powerStorage devices

Per-instance (instances[i])

{
"model_name": "Qwen/Qwen3-32B",
"hardware": "RTXPRO6000",
"npu_mem": {"mem_size": 96, "mem_bw": 1597, "mem_latency": 0},
"num_npus": 2,
"tp_size": 2,
"pp_size": 1,
"ep_size": 1,
"dp_group": null,
"pd_type": null,
"placement": {...}
}

Required fields

FieldTypeDescription
model_namestringHF id. Must match a config at configs/model/<model_name>.json (see Model config)
hardwarestringHardware label. Must match profiler/perf/<hardware>/
npu_mem.mem_sizefloatPer-GPU NPU memory in GB
npu_mem.mem_bwfloatPer-GPU NPU memory bandwidth in GB/s
npu_mem.mem_latencyfloatPer-GPU NPU memory latency in ns
pd_typestring | null"prefill", "decode", or null (combined)

Parallelism (at least one of num_npus / tp_size)

FieldTypeDefaultDescription
num_npusintinferred from tp_size * pp_sizeTotal GPUs for this instance
tp_sizeintinferred from num_npus // pp_sizeTensor-parallel degree
pp_sizeint1Pipeline-parallel degree
ep_sizeinttp_size (MoE) / 1 (dense)Expert-parallel degree
dp_groupstring | nullnullGroup ID. Instances with the same string share experts via cross-instance ALLTOALL

Constraints:

  • num_npus == tp_size * pp_size (always)
  • Without dp_group: ep_size <= tp_size
  • For MoE: ep_size must divide num_local_experts

placement (optional)

Per-layer / per-block weight + KV-cache placement rules. See Examples → CXL extended memory for a worked example.

"placement": {
"default": {"weights": "npu", "kv_loc": "npu", "kv_evict_loc": "cpu"},
"blocks": [
{"blocks": "0-3", "weights": "cxl:0", "kv_loc": "npu", "kv_evict_loc": "cpu"}
],
"layers": {
"embedding": {"weights": "cxl:1", "kv_loc": "npu", "kv_evict_loc": "cpu"}
}
}
Sub-fieldTypeRequiredDescription
defaultobjectCatch-all rule for layers / blocks not in blocks or layers
blocksarrayoptionalPer-decoder-block-range overrides
layersobjectoptionalPer-named-layer overrides

Each rule object has three string fields:

FieldAllowed valuesDescription
weightsnpu / cpu / cxl:<id>Where this layer's weights live
kv_locnpu / cpu / cxl:<id>Where active KV blocks live (attention layers only)
kv_evict_locnpu / cpu / cxl:<id>Where evicted KV blocks spill

blocks strings are dash-and-comma-separated ranges: "0-3", "4-7", "8,9,10", "11-23". Layer-name keys must match canonical layer names from the architecture YAML.

Validation rules

  • num_nodes == len(nodes) and per-node num_instances == len(instances).
  • Per-instance weight_per_gpu * num_npus <= npu_mem.mem_size * num_npus (otherwise startup OOM).
  • Hardware folder must exist at profiler/perf/<hardware>/<model_name>/<variant>/tp<tp_size>/.
  • dp_group must be a valid string or null.
  • All instances within the same dp_group must share the same ep_size and tp_size.

What's next

  • Model config: schema for the file model_name resolves to.
  • PIM config: schema for the file cpu_mem.pim_config resolves to.