Model FLOPs Utilization

Model FLOPs Utilization is a metric proposed by Google that describes how well utilized a GPU is during model training.¹ They define it as

From the PaLM paper

This is the ratio of the observed throughput (tokens-per-second) relative to the theoretical maximum throughput of a system operating at peak FLOPs. Crucially, the “theoretical maximum” throughput only accounts for the required operations to compute the forward+backward passes, and not rematerialization.

Meta defines it very similarly but as “the number of FLOPs a model theoretically utilizes compared to hardware peak FLOPs.”²

It is the AI version of arithmetic intensity which is an essential component of the roofline model.

In practice

Llama-3.1 reported an MFU of 38-43% during training.³

PaLM: Scaling Language Modeling with Pathways ↩
[2410.21680v1] Revisiting Reliability in Large-Scale Machine Learning Research Clusters ↩
The Llama-3 Herd of Models (arxiv.org) ↩

Glenn's Digital Garden

Explorer

Recent Notes

working at Microsoft

BXI

Reasoning models

Azure ND GB200 v6

Azure SmartNICs

Model FLOPs Utilization

In practice

Graph View

Backlinks

Glenn's Digital Garden

Explorer

Recent Notes

working at Microsoft

BXI

Reasoning models

Azure ND GB200 v6

Azure SmartNICs

Model FLOPs Utilization

In practice

Footnotes

Graph View

Backlinks