There were two versions of this GPU: a 40 GB HBM2 at initial launch followed by an 80 GB HBM2e version which fully replaced the launch version across the market.

Specifications

A100 40G SXM4

Each H100 SXM4 GPU has:12

  • 7 GPCs
  • 54 TPCs (7 or 8 TPCs/GPC; a yield thing)
  • 40 GB HBM2 (5 stacks)
    • 10 512-bit memory controllers
    • 1.555 TB/s
  • 300+300 GB/s NVLink (D2D)
  • 32+32 GB/s PCIe Gen4 (H2D)
  • 400 W maximum

The full GA100 GPU has 8 GPCs, 8 TPCs per GPC, 6 HBM stacks, and 12 memory controllers. Some specs refer to this, but I don’t know if NVIDIA shipped any of these.

A100 80G SXM4

NVIDIA released a kicker with upgraded HBM shortly after the A100 40G part shipped. Its FLOPS were the same, but its memory was different:

  • 80 GB HBM2e (5 stacks)
    • 10 512-bit memory controllers
    • 2.039 GB/s

Everything else was the same (power, etc) and it was drop-in compatible with the earlier 40G part.

Performance

The following are theoretical maximum performance in TFLOPS:1

Data TypeVectorMatrixSparse
FP649.719.5
FP3219.5
TF32156312
FP16312624
BF16312624
FP8
INT32
INT86241248
INT412482496
INT14992

Footnotes

  1. https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/a100/pdf/nvidia-a100-datasheet-us-nvidia-1758950-r4-web.pdf 2

  2. https://images.nvidia.com/aem-dam/en-zz/Solutions/data-center/nvidia-ampere-architecture-whitepaper.pdf