PIM config schema

PIM (Processing-In-Memory) device configs live at configs/pim/<name>.ini in DRAMSim3 INI format. The simulator's pim_model.py reads these to compute PIM-side attention latency when --enable-attn-offloading is on.

File location

configs/pim/
├── DDR4_8GB_3200_pim.ini
├── HBM2_1GB_2000_pim.ini
├── LPDDR4X_2GB_4266_pim.ini
├── LPDDR5_2GB_6400_pim.ini
└── README.md

The cluster config references one of these via the node's cpu_mem.pim_config field (without the .ini extension):

"cpu_mem": {
  "mem_size": 512,
  "mem_bw": 256,
  "mem_latency": 0,
  "pim_config": "DDR4_8GB_3200_pim"
}

Bundled configs

File	Protocol	Capacity	Speed	Notes
`DDR4_8GB_3200_pim.ini`	DDR4	8 GB	3200 MT/s	Standard DDR4 PIM module
`HBM2_1GB_2000_pim.ini`	HBM2	1 GB	2000 MT/s	HBM2 PIM (high-bandwidth)
`LPDDR4X_2GB_4266_pim.ini`	LPDDR4X	2 GB	4266 MT/s	Mobile-class PIM
`LPDDR5_2GB_6400_pim.ini`	LPDDR5	2 GB	6400 MT/s	Mobile-class PIM, faster

INI structure

Each PIM config has three sections.

`[dram_structure]`

[dram_structure]
protocol = DDR4
bankgroups = 2
banks_per_group = 4
rows = 65536
columns = 1024
device_width = 16
BL = 8
pim_type = SINGLE

Field	Type	Description
`protocol`	string	DRAM standard. `DDR4`, `DDR5`, `HBM2`, `HBM3`, `LPDDR4`, `LPDDR4X`, `LPDDR5`
`bankgroups`	int	Bank groups per device
`banks_per_group`	int	Banks per bank group
`rows`	int	Rows per bank
`columns`	int	Columns per row
`device_width`	int	Device data width in bits (typically 4 / 8 / 16)
`BL`	int	Burst length
`pim_type`	enum	`SINGLE` (one PIM unit per channel) or `DUAL` (two units per channel)

The simulator computes:

Bandwidth from device_width × BL × tCK × channel_count.
Capacity from rows × columns × device_width × banks × bankgroups.

`[timing]`

[timing]
tCK = 0.63          # clock period in ns
CL = 22             # CAS latency
CWL = 16            # CAS write latency
tRCD = 22           # RAS-to-CAS delay
tRP = 22            # row precharge time
tRAS = 52           # row active time
tRFC = 560          # refresh cycle
tREFI = 12480       # refresh interval
tRRD_S = 9          # row-to-row delay (different bank groups)
tRRD_L = 11         # row-to-row delay (same bank group)
tWTR_S = 4          # write-to-read delay (different bank groups)
tWTR_L = 12         # write-to-read delay (same bank group)
tFAW = 48           # four-activate window
tWR = 24            # write recovery
tRTP = 12           # read-to-precharge delay
tCCD_S = 4          # CAS-to-CAS (different bank groups)
tCCD_L = 8          # CAS-to-CAS (same bank group)

All timing parameters are in clock cycles unless explicitly named otherwise (tCK is in ns). The full list mirrors DRAMSim3's spec. The simulator extracts the latency-relevant subset for PIM access modeling.

For full DRAMSim3 timing semantics, see the DRAMSim3 docs.

`[system]`

[system]
channel_size = 8192
channels = 1
bus_width = 64
address_mapping = rorabgbachco
queue_structure = PER_BANK
row_buf_policy = OPEN_PAGE

Field	Type	Description
`channel_size`	int	Per-channel capacity in MB
`channels`	int	Number of memory channels (PIM compute happens per-channel)
`bus_width`	int	Memory bus width in bits
`address_mapping`	string	DRAMSim3 address-mapping scheme
`queue_structure`	enum	Queueing policy (`PER_BANK`, `PER_CHANNEL`, etc.)
`row_buf_policy`	enum	Row buffer policy (`OPEN_PAGE`, `CLOSE_PAGE`)

channels is the most simulator-relevant field: more channels = more parallel PIM compute per attention step. The trace generator distributes attention heads across channels for parallel execution.

Adding a new PIM config

Drop a new .ini file at configs/pim/<name>.ini.
Fill the three sections above. Reference the bundled configs for the right shape.
Reference it from your cluster config: "cpu_mem": {"pim_config": "<name>"}.
Run with --enable-attn-offloading.

The DRAMSim3 timing parameters can be sourced from a JEDEC datasheet or vendor spec for the specific DRAM part you're modeling.

Where this is used

serving/core/pim_model.py: loads the INI and exposes timing parameters to the trace generator.
serving/core/trace_generator.py: when --enable-attn-offloading is on, swaps NPU attention for PIM attention computed using the loaded model.
Power model: if the cluster config has a power: block, PIM energy is accounted for via the channel count (one PIM unit per channel × per-channel power).

For the full PIM offload mechanics, see Simulator → PIM offload. For a worked example, see Examples → PIM attention offload.

Gotchas

All four bundled INI files use pim_type = SINGLE. Switching to DUAL doubles the per-channel PIM compute capacity but also needs pim_type = DUAL to be supported by the cluster config's power model entry.
channels = N doesn't mean N independent PIM devices. The simulator models per-channel parallelism within one PIM device. For multiple PIM devices, you'd configure multiple nodes, but that's a different topology.
The INI is parsed as DRAMSim3 standard. Don't add custom fields the simulator's loader doesn't know about; they'll be ignored.

What's next

Cluster config → cpu_mem.pim_config how to wire this file into a cluster.
Simulator → PIM offload what happens at simulation time.

File location​

Bundled configs​

INI structure​

[dram_structure]​

[timing]​

[system]​

Adding a new PIM config​

Where this is used​

Gotchas​

What's next​