Intel Ponte Vecchio

There are two SKUs:¹

Intel Data Center GPU Max 1100 (56 Xe Cores)
Intel Data Center GPU Max 1550 (128 Xe Cores)

Specifications

Each Intel Data Center GPU Max 1550 has:

2 Xe Stacks
- 128 Xe Cores (64 per Stack)
  - 1024 Xe Vector Engines (8 per Xe Core)
  - 1024 Xe Matrix Engines (8 per Xe Core)
- 900 MHz (base), 1.6 GHz (peak)
- No sparsity
128 GB HBM2e
- 3.2768 TB/s²
16 Xe Links (D2D)
1x16 PCIe Gen5 or CXL 1.1 (H2D)
600 W maximum

Performance

The following are measured values from preproduction Aurora.³ These are measured values using DGEMM, and although it seems like they should make use of the Xe Matrix Engines (since the benchmarks simply calls into oneapi::mkl::blas::column_major::gemm⁴), it’s unclear how much vector FMA vs. matrix operations are being conducted here. For example, the FP64 matrix peak performance should be somewhere between

$1024 Xe Vector Engines \times 512 bits/Engine /64 bits \times 900 MHz \times 2 ops/clock = 14.7 TFLOPS$

and

$1024 Xe Matrix Engines \times 4096 bits/Engine \times \frac{1}{64 bits} \times 1600 MHz \times 2 ops/clock = 210 TFLOPS$

The measured 17 TFLOPS FP64 is far closer to the VFMA peak than the Matrix peak, yet it is higher than VFMA peak. Perhaps this indicates it’s all VFMA but it runs at higher-than-base frequency?

Data Type	VFMA	Matrix	Sparse
FP64	17
FP32	23
TF32	110
FP16	263
BF16	273
FP8
INT32
INT8	577

Nomenclature

The Intel terminology is confusing. According to James Brodman (Intel):⁵

1 GPU = 2 stacks
1 stack = 4 slices + 4 HBM2e controllers + 8 Xe Links¹
1 slice = 16 cores
1 core = 8 vector engine + 8 matrix engines
1 vector engine = 512 bits
1 matrix engine = 4096 bits

Also,

Stacks used to be called tiles
Vector Engines used to be called execution units (EUs)

Glenn's Digital Garden

Table of Contents

Explorer

Recent Notes

working at Microsoft

BXI

Reasoning models

Azure ND GB200 v6

Azure SmartNICs

Intel Ponte Vecchio

Specifications

Performance

Nomenclature

Graph View

Backlinks

Glenn's Digital Garden

Table of Contents

Explorer

Recent Notes

working at Microsoft

BXI

Reasoning models

Azure ND GB200 v6

Azure SmartNICs

Intel Ponte Vecchio

Specifications

Performance

Nomenclature

Footnotes

Graph View

Backlinks