NVIDIA H100

H100 is NVIDIA’s first Hopper-based GPU.

Specifications

H100 SXM5

Each H100 SXM5 GPU has:¹

8 GPCs
66 TPCs (an uneven number of TPCs per GPC?)
- 132 SMs (2 per TPC)
- 16,896 FP32 cores (128 per SM)
- 528 4th generation tensor cores (4 per SM)
- 1.83 GHz (low precision), 1.98 GHz (high precision)
- 2:4 structured sparsity
80 GB HBM3 (5 stacks)
- 10 512-bit memory controllers
- 3.35 TB/s (max)?
900 GB/s NVLink (D2D)
128 GB/s PCIe Gen5 (H2D)
700 W maximum

GH100

GH100 GPUs ship with a higher-spec part than the SXM5 DGX-style node. Each GH100 GPU has:¹

8 GPCs
72 TPCs (9 TPCs/GPC)
- 144 SMs (2 per TPC)
- 18,432 FP32 cores (128 per SM)
- 576 4th generation tensor cores (4 per SM)
6 HBM3 or HBM2e stacks (unclear how much capacity per stack)¹
- 12 512-bit memory controllers

H200

H200 GPUs are the same as H100 GPUs (SXM5 or GH variant?), just with 144 GB of HBM3e (more and faster HBM).

Performance

The following are theoretical maximum performance in TFLOPS:²

Data Type	VFMA	Matrix	Sparse
FP64	33.5	66.9
FP32	66.9
TF32		494.7	989.4
FP16	133.8	989.4	1978.9
BF16	133.8	989.4	1978.9
FP8		1978.9	3957.8
INT32	33.5
INT8		1978.9	3957.8

You can also project the HPL performance of any H100-based supercomputer by looking at the TFLOPS/node or TFLOPS/GPU from the Top500 list and linearly extrapolate.

Interestingly, the $R_{peak}$ per H100 and H200 varies between 55.0 and 59.0 TF/GPU which is somewhere between the VFMA and Matrix FP64 performance above.

Glenn's Digital Garden

Table of Contents

Explorer

Recent Notes

BXI

Meta Llama-3.1

checkpointing

Storage for LLM training

Availability

NVIDIA H100

Specifications

H100 SXM5

GH100

H200

Performance

Graph View

Backlinks

Glenn's Digital Garden

Table of Contents

Explorer

Recent Notes

BXI

Meta Llama-3.1

checkpointing

Storage for LLM training

Availability

NVIDIA H100

Specifications

H100 SXM5

GH100

H200

Performance

Footnotes

Graph View

Backlinks