According to NVIDIA,1

The DGX SuperPOD is a complete turnkey solution with specific BOM, Installation services, support services, and guaranteed performance. … The NVIDIA DGX SuperPOD is a solution that mirrors what NVIDIA operates internally, which allows NVIDIA to offer the best customer experience possible. … Size is not what determines whether something is an NVIDIA DGX SuperPOD.

It is a predefined solution that only allows you to change very minor factors such as

  • Nodes per rack - though you cannot use free rack space to include non-SuperPOD hardware
  • Cable lengths - but you cannot use patch panels or other things that may affect signal integrity
  • Specific rack models and PDUs - but they must conform to EIA-310 standards, have 19” EIA mounting, and must be at least 600mm x 1100mm and 42 RU high.

Storage is the only thing that has flexibility, and NVIDIA maintains a list of acceptable third-party storage systems.

Storage

NVIDIA has several levels of “storage performance requirements” that change with each new generation of GPU. For example,

NVIDIA DGX B200

NVIDIA DGX B200 is defined by SUs, each containing 32 8-way DGX B200 servers.

B200 has two levels of performance: standard and enhanced. They offer the following guidelines for aggregate system performance:2

PatternSUs# Nodes# GPUsStandardEnhanced
Read13225640 GB/s125 GB/s
Write13225620 GB/s62 GB/s
Read41281024160 GB/s500 GB/s
Write4128102480 GB/s250 GB/s

This comes out to per-B200 GPU performance of

PatternStandardEnhanced
Read156 MB/s/GPU488 MB/s/GPU
Write78 MB/s/GPU242 MB/s/GPU

Footnotes

  1. NVIDIA DGX SuperPOD FAQ — Frequently Asked Questions

  2. Storage Architecture — NVIDIA DGX SuperPOD: Next Generation Scalable Infrastructure for AI Leadership Reference Architecture Featuring NVDIA DGX B200