High-performance Linpack is the industry-standard benchmark for FP64 performance of supercomputers.1 It is the basis for the Top500 list.
What it does
It solves a linear system of equations using LU decomposition with partial pivoting. On paper, solving this problem requires a well-defined number of floating point operations:
So HPL is able to calculate the maximum FLOPS a supercomputer can process () using:
The Top500 list’s raw data includes both and for every submission, so using that information, you can tell how long it took for each HPL run to complete.
Predicting performance
In my experience on supercomputers with fat tree interconnect topologies, HPL scales linearly with node count. If you know how many FLOPS a specific combination of compute node and interconnect got at multiple scales, you can linearly extrapolate the peak FLOPS of any supercomputer with that same architecture.
The Top500 list often contains multiple supercomputers with the same GPUs and interconnects, so there should be enough information to predict HPL scores for any system of any size. You can see how to do this using a script I wrote to predict H100 cluster performance.
For example, the June 2024 Top500 list suggests the following:
Accelerator | Examples | Single-accelerator HPL TFLOPS |
---|---|---|
NVIDIA A100 | Perlmutter | 11-15 |
NVIDIA H100 | Eagle, Eos | 36-43 |
AMD MI250X | Frontier | 34-36 |
AMD MI300A | rzAdams, Tuolumne | 38.4 |