Model FLOPs Utilization is a metric proposed by Google that describes how well utilized a GPU is during model training.1 They define it as

From the PaLM paper

This is the ratio of the observed throughput (tokens-per-second) relative to the theoretical maximum throughput of a system operating at peak FLOPs. Crucially, the “theoretical maximum” throughput only accounts for the required operations to compute the forward+backward passes, and not rematerialization.

Meta defines it very similarly but as “the number of FLOPs a model theoretically utilizes compared to hardware peak FLOPs.”2

It is the AI version of arithmetic intensity which is an essential component of the roofline model.

In practice

Llama-3.1 reported an MFU of 38-43% during training.3

Footnotes

  1. PaLM: Scaling Language Modeling with Pathways

  2. [2410.21680v1] Revisiting Reliability in Large-Scale Machine Learning Research Clusters

  3. The Llama-3 Herd of Models (arxiv.org)