Skip to main content

Model config schema

Model config files live at configs/model/<org>/<name>.json and are raw HuggingFace config.json files: exactly what AutoModelForCausalLM would download from the hub. The simulator and profiler read a small subset of fields; the rest are ignored.

This page documents the subset that matters.

File location

Per model:

configs/model/
├── meta-llama/
│ └── Llama-3.1-8B.json
├── Qwen/
│ ├── Qwen3-32B.json
│ └── Qwen3-30B-A3B-Instruct-2507.json
└── ...

The instance's model_name field in Cluster config references the file relative to configs/model/.

If the file is absent and model_name looks like an HF id, the profiler downloads and caches it on first run. The simulator doesn't auto-download; you need a local file before running.

Required fields (the subset the simulator reads)

FieldTypeUsed byDescription
model_typestringprofilerPicks the architecture YAML at profiler/models/<model_type>.yaml. e.g. llama, qwen3, qwen3_moe, mixtral, phimoe
hidden_sizeintbothModel embedding / hidden dim
num_hidden_layersintbothNumber of decoder blocks
num_attention_headsintbothTotal attention heads (for TP scaling)
num_key_value_headsintbothDistinct KV heads (for GQA scaling)
intermediate_sizeintbothMLP intermediate dim
vocab_sizeintbothEmbedding / lm_head output dim
head_dimintbothImportant if not hidden_size / num_attention_heads (Qwen3 has explicit head_dim)

When head_dim is absent from the config, the simulator falls back to hidden_size // num_attention_heads. This is wrong for Qwen3 (which has head_dim: 128 and hidden_size: 2048 / num_attention_heads: 32 → would compute 64). Always include head_dim for models that have it in their HF config.

MoE fields (MoE models only)

FieldTypeDescription
num_local_expertsintTotal experts (Mistral-style: e.g., num_local_experts: 8 for Mixtral 8x7B)
num_expertsintAlternative naming (HF / Qwen-style: e.g., num_experts: 128 for Qwen3-30B-A3B)
num_experts_per_tokinttop-K activations per token. Typical values: 2 (Mixtral), 8 (Qwen3 MoE)
moe_intermediate_sizeintPer-expert MLP intermediate dim. Often smaller than the dense intermediate_size

The simulator's config_builder.py accepts either num_local_experts or num_experts and treats them equivalently.

Optional fields the simulator may consume

FieldTypeDescription
torch_dtypestringDefault weight dtype. Used when --dtype isn't passed. e.g. bfloat16, float16, float32
architecturesarrayFirst entry's class name is informational; the simulator dispatches via model_type
mlp_only_layersarrayIndices of layers using dense MLP (vs MoE). Hybrid MoE/dense models like Qwen3-MoE-Instruct use this

Fields the simulator ignores

The HF config has many more fields the simulator doesn't use - things like bos_token_id, eos_token_id, attention_dropout, max_position_embeddings, rope_*, rms_norm_eps, initializer_range, tie_word_embeddings. Leave them as the HF config has them; ignored fields don't affect simulation.

Examples

Llama 3.1 8B (dense)

{
"architectures": ["LlamaForCausalLM"],
"model_type": "llama",
"hidden_size": 4096,
"intermediate_size": 14336,
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"vocab_size": 128256,
"torch_dtype": "bfloat16"
}

(head_dim defaults to 4096 / 32 = 128, which is correct for Llama 3.1.)

Qwen3-32B (dense, explicit head_dim)

{
"architectures": ["Qwen3ForCausalLM"],
"model_type": "qwen3",
"hidden_size": 5120,
"intermediate_size": 25600,
"num_attention_heads": 64,
"num_hidden_layers": 64,
"num_key_value_heads": 8,
"head_dim": 128,
"vocab_size": 151936,
"torch_dtype": "bfloat16"
}

(Default would be 5120 / 64 = 80, but Qwen3 uses 128. Must include head_dim.)

Qwen3-30B-A3B (MoE)

{
"architectures": ["Qwen3MoeForCausalLM"],
"model_type": "qwen3_moe",
"hidden_size": 2048,
"intermediate_size": 6144,
"num_attention_heads": 32,
"num_hidden_layers": 48,
"num_key_value_heads": 4,
"head_dim": 128,
"num_experts": 128,
"num_experts_per_tok": 8,
"moe_intermediate_size": 768,
"vocab_size": 151936,
"torch_dtype": "bfloat16"
}

Adding a new model

  1. Drop the raw HF config.json at configs/model/<org>/<name>.json.
  2. Verify the required fields above are present.
  3. Add head_dim explicitly if the model has it in its HF config.
  4. Make sure profiler/models/<model_type>.yaml exists. If not, you need a new architecture YAML, see Profiler → Adding a model architecture.

Gotchas

  1. head_dim fallback is silent. If you forget to include it and the model's actual head_dim differs from hidden_size / num_attention_heads, the simulator runs but computes wrong KV-cache sizes. Validate your config against the HF model card.
  2. num_local_experts vs num_experts: same concept, different naming convention across model families. Pick whichever the model's HF config uses; the simulator handles both.
  3. model_type is case-sensitive and must match a YAML at profiler/models/<model_type>.yaml exactly.

What's next