Reasoning models are a class of AI models that can adjust the amount of computation and time they use during inferencing (test-time compute) to generate better answers. For example, 20 seconds of reasoning was found to deliver the same improvements that would have required 100,000 times more parameters trained for 100,000 more time.1 These models are said to engage in “System 2” thinking.
OpenAI has very good documentation on the strengths and weaknesses of reasoning models compared to traditional system 1 transformers.
System 1 and System 2
Daniel Kahneman, a Nobel-prize winning psychologist, first described the idea of System 1 and System 2 thinking in his book, “Thinking, Fast and Slow.”
- System 1 thinking is instinctive and reflexive; this is the thought required to know the answer to questions like, “What’s 1+1?”
- System 2 thinking is the that requires conscious effort. For example, “What are the pros and cons of fat tree topologies” would require most people to engage in the sort of reasoning and structured thought characteristic of System 2 thinking.
Although originally conceived of in the context of how humans think, this is also a convenient way to distinguish the sort of reasoning and reflection (trough chain-of-thought) that reasoning models use prior to providing a final answer.
Examples
OpenAI o1
OpenAI’s o1 models are the successful combination of reinforcement learning and scaling unsupervised learning with transformers.2 o1 creates its own chains of thought in the form of reasoning tokens and follows them instead of training on a human-supplied chain of thought. Internally, these chains of thought involve the model questioning itself as it comes to a solution, and the model is aware of the constraints placed on its ability to think (e.g., limited time, as enforced through a budget of reasoning tokens).
DeepSeek-R1
DeepSeek-R1 is an 671-billion parameter, open-source reasoning model. It introduced methods of fine-tuning any transformer into becoming a reasoning model using modest resources.
Future
Current reasoning models can spend minutes reasoning, but future reasoning models will be allowed to think for “months or years” to improve the quality of their responses.23
Reasoning models will be able to plan and self-correct, enabling these models to contribute to their own development and improvement.2 Because reasoning allows models to work around problems they encounters on the path to find novel solutions, they form a strong foundation for superintelligence.
We are seeing the first examples of this in agentic systems like OpenAI’s Deep Research.
Seminal papers
I stole this reading list from No Hype DeepSeek-R1 Reading List.
- Self-Rewarding Language Models by Meta laid the groundwork for using one model to both generate content as well as evaluate how well it did, establishing a way for models to think about what they’re doing.
- Thinking LLMs: General Instruction Following with Thought Generation is another Meta paper that used self-rewarding models to create chains of thought, which are the basis for reasoning models.
- DeepSeek-R1 explains how DeepSeek-R1 was fine-tuned into a reasoning model.
Footnotes
-
OpenAI scientist Noam Brown stuns TED AI Conference: ’20 seconds of thinking worth 100,000x more data’ ↩
-
Get Ready for ‘Long Thinking,’ AI’s Next Leap Forward - Jensen Huang claimed, “In many cases, as you know, we’re now working on artificial intelligence applications that run for 100 days,” ↩