Glenn's Digital Garden

❯

prefill

Feb 09, 2026

artificial-intelligence

This is prefill:

Prefill is the part of LLM inferencing where the prompt is run through the model. The products of prefill are:

K and V vectors for every layer of attention and every token in the prompt
The final hidden state of the model prior to output tokens being generated

Because the whole input prompt is passed in, it is a dense computation and is therefore compute-limited. This contrasts with the next phase, decode.

Graph View

Backlinks

KV cache
LLM inferencing in production
LLM inferencing
continuous batching
decode
Explorations of RDMA in LLM Systems
prefix caching
NVIDIA Rubin CPX
training vs inference

Created with Quartz v4.5.2 © 2026

glennklockwood.com
@glennklockwood.com