Prefill is the part of LLM inferencing where the prompt is run through the model. The products of prefill are:

  1. K and V vectors for every layer of attention and every token in the prompt
  2. The final hidden state of the model prior to output tokens being generated

Because the whole input prompt is passed in, it is a dense computation and is therefore compute-limited. This contrasts with the next phase, decode.