H100 is NVIDIA’s first Hopper-based GPU.

Specifications

H100 SXM5

Each H100 SXM5 GPU has:1

  • 8 GPCs
  • 66 TPCs (an uneven number of TPCs per GPC?)
  • 80 GB HBM3 (5 stacks)
    • 10 512-bit memory controllers
    • 3.35 TB/s (max)?
  • 900 GB/s NVLink (D2D)
  • 128 GB/s PCIe Gen5 (H2D)
  • 700 W maximum

GH100

GH100 GPUs ship with a higher-spec part than the SXM5 DGX-style node. Each GH100 GPU has:1

  • 8 GPCs
  • 72 TPCs (9 TPCs/GPC)
    • 144 SMs (2 per TPC)
    • 18,432 FP32 cores (128 per SM)
    • 576 4th generation tensor cores (4 per SM)
  • 6 HBM3 or HBM2e stacks (unclear how much capacity per stack)1
    • 12 512-bit memory controllers

H200

H200 GPUs are the same as H100 GPUs (SXM5 or GH variant?), just with 144 GB of HBM3e (more and faster HBM).

Performance

The following are theoretical maximum performance in TFLOPS:2

Data TypeVFMAMatrixSparse
FP6433.566.9
FP3266.9
TF32494.7989.4
FP16133.8989.41978.9
BF16133.8989.41978.9
FP81978.93957.8
INT3233.5
INT81978.93957.8

You can also project the HPL performance of any H100-based supercomputer by looking at the TFLOPS/node or TFLOPS/GPU from the Top500 list and linearly extrapolate.

Interestingly, the per H100 and H200 varies between 55.0 and 59.0 TF/GPU which is somewhere between the VFMA and Matrix FP64 performance above.

Footnotes

  1. NVIDIA Hopper Architecture In-Depth | NVIDIA Technical Blog 2 3

  2. NVIDIA H100 Tensor Core GPU Datasheet