MI300X is AMD’s first GPU to use CDNA 3.
Specifications
Each MI300X GPU has:1
- 8 XCDs
- 304 CUs (38 per XCD)
- 19,456 Stream Processors (64 per CU)
- 1,216 Matrix Cores (4 per CU)
- 2.1 GHz (peak)
- 2:4 structured sparsity
- 304 CUs (38 per XCD)
- 192 GB HBM3 (8 stacks)
- 5.3 TB/s (max)
- 7x16 AMD Infinity Fabric (D2D)
- 1x16 PCIe Gen5 (H2D)
- 750 W maximum (TBP)2
It also has a complex memory hierarchy that I don’t yet understand.3
Performance
The following are theoretical maximum performance in TFLOPS:1
Data Type | VFMA | Matrix | Sparse |
---|---|---|---|
FP64 | 81.7 | 163.4 | |
FP32 | 164.4 | 163.4 | |
TF32 | 653.7 | 1307.4 | |
FP16 | 1307.4 | 2614.9 | |
BF16 | 1307.4 | 2614.9 | |
FP8 | 2614.9 | 5229.8 | |
INT32 | |||
INT8 | 2614.9 | 5229.8 |
A single 8-way OAM MI300X UBB is capable of hosting a copy of Llama 3.1 405B in FP16.4
Platforms
The following platforms support MI300X GPUs:3
- Azure ND MI300X v5
- Dell PowerEdge XE9680
- HPE Cray XD675
- Lenovo SR685a V3
- Supermicro AS-8125GS-TNMR2
The following cloud providers sell MI300X:
- Vultr5