According to NVIDIA,1
The DGX SuperPOD is a complete turnkey solution with specific BOM, Installation services, support services, and guaranteed performance. … The NVIDIA DGX SuperPOD is a solution that mirrors what NVIDIA operates internally, which allows NVIDIA to offer the best customer experience possible. … Size is not what determines whether something is an NVIDIA DGX SuperPOD.
It is a predefined solution that only allows you to change very minor factors such as
- Nodes per rack - though you cannot use free rack space to include non-SuperPOD hardware
- Cable lengths - but you cannot use patch panels or other things that may affect signal integrity
- Specific rack models and PDUs - but they must conform to EIA-310 standards, have 19” EIA mounting, and must be at least 600mm x 1100mm and 42 RU high.
Storage is the only thing that has flexibility, and NVIDIA maintains a list of acceptable third-party storage systems.
Storage
NVIDIA has several levels of “storage performance requirements” that change with each new generation of GPU. For example,
NVIDIA DGX B200
NVIDIA DGX B200 is defined by SUs, each containing 32 8-way DGX B200 servers.
B200 has two levels of performance: standard and enhanced. They offer the following guidelines for aggregate system performance:2
| Pattern | SUs | # Nodes | # GPUs | Standard | Enhanced |
|---|---|---|---|---|---|
| Read | 1 | 32 | 256 | 40 GB/s | 125 GB/s |
| Write | 1 | 32 | 256 | 20 GB/s | 62 GB/s |
| Read | 4 | 128 | 1024 | 160 GB/s | 500 GB/s |
| Write | 4 | 128 | 1024 | 80 GB/s | 250 GB/s |
This comes out to per-B200 GPU performance of
| Pattern | Standard | Enhanced |
|---|---|---|
| Read | 156 MB/s/GPU | 488 MB/s/GPU |
| Write | 78 MB/s/GPU | 242 MB/s/GPU |