Model FLOPs Utilization is a metric proposed by Google that describes how well utilized a GPU is during model training.1 They define it as
From the PaLM paper
This is the ratio of the observed throughput (tokens-per-second) relative to the theoretical maximum throughput of a system operating at peak FLOPs. Crucially, the “theoretical maximum” throughput only accounts for the required operations to compute the forward+backward passes, and not rematerialization.
Meta defines it very similarly but as “the number of FLOPs a model theoretically utilizes compared to hardware peak FLOPs.”2
It is the AI version of arithmetic intensity which is an essential component of the roofline model.
In practice
Llama-3.1 reported an MFU of 38-43% during training.3