NVIDIA B200 is their Blackwell-generation GPU. Each GPU has:
- 10 GPCs1
- ? TPCs
- ? SMs (? per TPC)
- ? FP32 cores (128 per SM)
- ? tensor cores (4 per SM)
- ? GHz (low precision), ? GHz (high precision)
- 2:4 structured sparsity
- 192 GB HBM3 (8 stacks)
- 8 TB/s (max)
- 2x 900 GB/s NVLink v5 (D2D)2
- 2x 256 GB/s PCIe Gen6 (H2D)2
- 1000 W maximum
B100 GPUs are a lower-power variant of B200 (700W) that is meant to be a “drop-in replacement” for HGX H100 platforms.3 That is, you can take a server platform built for 8-way H100 baseboards, swap in B100 baseboards, and sell them without having to re-engineer power or thermals.
Performance
The following are theoretical maximum performance in TFLOPS:3 (This is wrong)
Data Type | VFMA | Matrix | Sparse |
---|---|---|---|
FP64 | 90 | 90 | |
FP32 | 180 | ||
TF32 | 1250 | 2500 | |
FP16 | 5000 | 10000 | |
BF16 | 2500 | 5000 | |
FP8 | 5000 | 10000 | |
FP6 | 5000 | 10000 | |
FP4 | 10000 | 20000 | |
INT8 | 5000 | 10000 |