MI355X is an AMD GPU built on CDNA 4 that was released in June 2025.[^launchtweet] It has lower FP64 and FP32 performance than its predecessor MI300X but has significantly higher 16-bit and 8-bit performance, and it adds support for FP6 and FP4.
It is a GPU optimized for inferencing. Its low-precision performance is comparable to B200, its FP64 performance is much higher than that of B200, and it uses 200 W more power. It ships in an 8-way OAM UBB form factor.
Specifications
Each MI355X GPU has:1
- 8 XCDs
- 256 CUs (32 per XCD)
- 16,384 Stream Processors (64 per CU)
- 1,024 Matrix Cores (4 per CU)
- 2.4 GHz (peak)
- 2:4 structured sparsity
- 256 CUs (32 per XCD)
- 288 GB HBM3e (8 stacks?)
- 8 TB/s (max)
- 7x 153.6 GB/s AMD Infinity Fabric (D2D)
- 1x PCIe Gen5 x16 (H2D)
- 1400 W maximum
Performance
The following are theoretical maximum performance in TFLOPS:1
Data Type | VFMA | Matrix | Sparse |
---|---|---|---|
FP64 | 78.6 | 78.6 | |
FP32 | 157.3 | 157.3 | |
TF32 | |||
FP16 | 2516.6 | 5033.2 | |
BF16 | 2516.6 | 5033.2 | |
FP8 | 5033.2 | 10066.4 | |
FP6 | 10066.3 | 20132.6 | |
FP4 | 10066.3 | 20132.6 | |
INT8 | 5033.2 | 10066.4 | |
INT4 | 5033.2 | 10066.4 |