Troubleshooting
Common errors during install and first run, with the quickest fix.
If your issue isn't here, please file a bug at github.com/casys-kaist/LLMServingSim/issues with the full command, the error output, and your OS / Docker / GPU versions.
Submodules are missing
Symptom: Build fails with errors about missing files under
astra-sim/extern/graph_frontend/chakra/ or astra-sim/build/.
Cause: You cloned without --recurse-submodules.
Fix:
git submodule update --init --recursive
Then re-run ./scripts/compile.sh.
docker: permission denied
Symptom:
docker: Got permission denied while trying to connect to the
Docker daemon socket
Cause: Your user isn't in the docker group.
Fix:
sudo usermod -aG docker $USER
newgrp docker
# or log out and back in
GPU not detected
Symptom: Inside the vLLM container, nvidia-smi says
command not found or no devices found.
Cause: NVIDIA Container Toolkit isn't installed or Docker isn't configured to use it.
Fix: install / re-configure the toolkit (see Prerequisites) and restart Docker:
sudo systemctl restart docker
Then verify:
docker run --rm --gpus all nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smi
If the host's nvidia-smi works but the container's doesn't, the
toolkit is the problem. If the host's nvidia-smi fails too, install
the NVIDIA driver first.
Hugging Face: gated model / 401 / 403
Symptom: When profiling a Llama 3.x or gated Qwen variant:
huggingface_hub.utils._errors.GatedRepoError: Access to model
meta-llama/Llama-3.1-8B is restricted...
Fix:
-
Accept the license on the model page (one-time, on huggingface.co).
-
Set
HF_TOKENin your shell before launching the vLLM container:export HF_TOKEN="hf_xxxxxxxxxxxxxxxxxxxxxxxxxx"./scripts/docker-vllm.sh
The token gets forwarded into the container automatically. Confirm
with echo $HF_TOKEN inside the container.
ASTRA-Sim build fails
Symptom: ./scripts/compile.sh errors out partway through, often
with a CMake or compiler message.
Common causes & fixes:
-
Missing build deps inside the container. The official
astrasim/tutorial-micro2024image has them by default. If you customized the image, ensurecmake,g++,protobuf-compiler,libprotobuf-dev, andlibboost-devare installed. -
Stale build state. Wipe the build directories and retry:
rm -rf astra-sim/build/astra_analytical/build/./scripts/compile.sh -
Outside the container.
compile.shis meant to run inside the simulator container, not on the host. Use./scripts/docker-sim.shfirst.
Container name already in use
Symptom:
docker: Error response from daemon: Conflict. The container name
"/servingsim_docker" is already in use by container "abc123..."
Cause: A previous run left the container around.
Fix: either re-attach or remove and recreate.
# re-attach to existing
docker start -ai servingsim_docker
# or wipe and recreate
docker rm -f servingsim_docker
./scripts/docker-sim.sh
Same idea for vllm_docker.
Missing profile data
Symptom: Running the simulator with a hardware / model combination that doesn't have profile data:
FileNotFoundError: ../profiler/perf/<hardware>/<model>/<variant>/tp1/dense.csv
Cause: The (hardware, model, dtype, kv_cache_dtype) tuple
doesn't have a profiled CSV bundle.
Fix: either
- pick a hardware / model combo that's already profiled (see the Simulator → Reading the output table), or
- run the Profiler to generate the missing bundle yourself.
--max-num-batched-tokens warning at startup
Symptom:
WARNING: runtime --max-num-batched-tokens (4096) exceeds profiled
sweep bound (2048). Lookups will extrapolate.
Cause: You're running the simulator with a token budget larger than the one the profiler swept. Latency lookups will linearly extrapolate past the measured range.
Fix:
- For best accuracy, re-profile at the higher
--max-num-batched-tokens(MAX_NUM_BATCHED_TOKENS=4096 ./profiler/profile.sh). - Or stay at the profiled bound. Extrapolation is usually fine for small overshoots; large ones can drift.
Simulator stuck / very slow on big workloads
Symptom: Simulation runs but takes much longer than expected, especially with MoE + EP or large prefix caches.
Common causes & fixes:
- Block-copy disabled. For MoE, set
--expert-routing-policy COPY(the default).RRandRANDare slower because they touch ASTRA-Sim per token rather than per block. - Verbose logging.
--log-level DEBUGwrites a lot. Drop to--log-level INFOorWARNING. --log-intervaltoo small. Setting it to0.1makes the logger run every 100 ms; raise to1.0(default) or higher.
Out of memory inside the vLLM container
Symptom: Profiler crashes with CUDA OOM partway through the attention sweep.
Fix: lower MAX_NUM_BATCHED_TOKENS in profiler/profile.sh,
or skip the heavy categories with environment variables (see
Profiler → Running).
Still stuck?
- GitHub Issues: casys-kaist/LLMServingSim/issues
- Discussions: casys-kaist/LLMServingSim/discussions
When you file a bug, please include:
- The exact command you ran
- The full error output
- Your OS, Docker version, NVIDIA driver, GPU model
- Whether you're inside the simulator container or the vLLM container (or bare metal)