There were two versions of this GPU: a 40 GB HBM2 at initial launch followed by an 80 GB HBM2e version which fully replaced the launch version across the market.
Specifications
A100 40G SXM4
- 7 GPCs
- 54 TPCs (7 or 8 TPCs/GPC; a yield thing)
- 108 SMs (2 per TPC)
- 6,912 FP32 cores (64 per SM)
- 432 3rd generation tensor cores (4 per SM)
- ? GHz (low precision), ? GHz (high precision)
- 2:4 structured sparsity
- 40 GB HBM2 (5 stacks)
- 10 512-bit memory controllers
- 1.555 TB/s
- 300+300 GB/s NVLink (D2D)
- 32+32 GB/s PCIe Gen4 (H2D)
- 400 W maximum
The full GA100 GPU has 8 GPCs, 8 TPCs per GPC, 6 HBM stacks, and 12 memory controllers. Some specs refer to this, but I don’t know if NVIDIA shipped any of these.
A100 80G SXM4
NVIDIA released a kicker with upgraded HBM shortly after the A100 40G part shipped. Its FLOPS were the same, but its memory was different:
- 80 GB HBM2e (5 stacks)
- 10 512-bit memory controllers
- 2.039 GB/s
Everything else was the same (power, etc) and it was drop-in compatible with the earlier 40G part.
Performance
The following are theoretical maximum performance in TFLOPS:1
Data Type | Vector | Matrix | Sparse |
---|---|---|---|
FP64 | 9.7 | 19.5 | |
FP32 | 19.5 | ||
TF32 | 156 | 312 | |
FP16 | 312 | 624 | |
BF16 | 312 | 624 | |
FP8 | |||
INT32 | |||
INT8 | 624 | 1248 | |
INT4 | 1248 | 2496 | |
INT1 | 4992 |