NVIDIA A100

There were two versions of this GPU: a 40 GB HBM2 at initial launch followed by an 80 GB HBM2e version which fully replaced the launch version across the market.

Specifications

A100 40G SXM4

Each H100 SXM4 GPU has:¹²

7 GPCs
54 TPCs (7 or 8 TPCs/GPC; a yield thing)
- 108 SMs (2 per TPC)
- 6,912 FP32 cores (64 per SM)
- 432 3rd generation tensor cores (4 per SM)
- ? GHz (low precision), ? GHz (high precision)
- 2:4 structured sparsity
40 GB HBM2 (5 stacks)
- 10 512-bit memory controllers
- 1.555 TB/s
300+300 GB/s NVLink (D2D)
32+32 GB/s PCIe Gen4 (H2D)
400 W maximum

The full GA100 GPU has 8 GPCs, 8 TPCs per GPC, 6 HBM stacks, and 12 memory controllers. Some specs refer to this, but I don’t know if NVIDIA shipped any of these.

A100 80G SXM4

NVIDIA released a kicker with upgraded HBM shortly after the A100 40G part shipped. Its FLOPS were the same, but its memory was different:

80 GB HBM2e (5 stacks)
- 10 512-bit memory controllers
- 2.039 GB/s

Everything else was the same (power, etc) and it was drop-in compatible with the earlier 40G part.

Performance

The following are theoretical maximum performance in TFLOPS:¹

Data Type	Vector	Matrix	Sparse
FP64	9.7	19.5
FP32	19.5
TF32		156	312
FP16		312	624
BF16		312	624
FP8
INT32
INT8		624	1248
INT4		1248	2496
INT1		4992

Glenn's Digital Garden

Explorer

NVIDIA A100

Specifications

A100 40G SXM4

A100 80G SXM4

Performance

Graph View

Table of Contents

Backlinks

Glenn's Digital Garden

Explorer

NVIDIA A100

Specifications

A100 40G SXM4

A100 80G SXM4

Performance

Footnotes

Graph View

Table of Contents

Backlinks