Perlmutter is a heterogeneous Cray EX supercomputer composed of NVIDIA A100 GPUs and AMD Milan CPUs operated by the National Energy Research Scientific Computing Center (NERSC).

System overview

Perlmutter has:

1,536 GPU nodes with 40 GB A100 GPUs
256 GPU nodes with 80 GB A100 GPUs
3,072 CPU nodes with dual-socket Milan CPUs

Perlmutter uses a Slingshot fabric in a dragonfly topology.

Node architecture

Each Cray EX235n node has:

1x AMD Milan 7763 CPU
256 GB DDR4 DRAM
4x NVIDIA A100 GPUs
No node-local storage
4x Slingshot-11 NICs

Each Cray EX425 node has:

2x AMD Milan 7763 CPUs
512 GB DDR4 DRAM
No GPUs
No node-local storage
1x Slingshot-11 NICs

Network architecture

Each Cray EX4000 cabinet is a dragonfly group with:¹

128x GPU nodes or 256 CPU nodes
32x 64-port 200G Slingshot switches
4 NICs per node (GPU nodes) or 1 NIC per node (CPU nodes)

Perlmutter has multiple types of dragonfly groups:

24x¹ (or 14x? 28x?²) GPU groups containing GPU nodes
12x CPU groups containing CPU-only nodes
4x storage groups containing Lustre servers
1x service group containing login nodes and other ancillary servers

Within each group, each switch has:³

Up to 16 ports down to endpoints (L0)
31 ports to other switches in the rack (L1)
Up to 16 ports to other groups (L2)

So per group, there are³

512 ports down to endpoints
992 ports connecting intra-group switches
Up to 512 ports connecting inter-group switches

Perlmutter Architecture (nersc.gov) ↩ ↩²
If there are 128 GPU nodes per cabinet and Perlmutter has 1,792 GPU nodes, there should be 14 cabinets. The Perlmutter User Guide is inconsistent in that it says “GPU cabinets contain one Dragonfly group per cabinet…making a total of 24 groups.” I know that Perlmutter’s original configuration had two dragonfly groups per GPU cabinet, so maybe there is residual confusion here. ↩
See Network architecture; Cray EX has the same L0/L1 configuration. ↩ ↩²

Glenn's Digital Garden

Table of Contents

Explorer

Recent Notes

Social media platforms

Scaling laws

NVIDIA B200

Slingshot

Cray EX154n

NERSC Perlmutter

System overview

Node architecture

Network architecture

Graph View

Backlinks

Glenn's Digital Garden

Table of Contents

Explorer

Recent Notes

Social media platforms

Scaling laws

NVIDIA B200

Slingshot

Cray EX154n

NERSC Perlmutter

System overview

Node architecture

Network architecture

Footnotes

Graph View

Backlinks