Supervised fine-tuning (SFT) is a post-pretraining process whereby

  • The model is given input-response pairs which are good
  • The model is updated to make these desired responses more likely

Example input record

SFT inputs are (prompt, response) pairs:

  • Prompt: “Explain why the sky is blue in two sentences.”
  • Target response (the example to imitate): “Sunlight is scattered by molecules in the atmosphere, and shorter blue wavelengths scatter more strongly than longer red wavelengths. That’s why the sky looks blue to us most of the time.”