Reasoning models are a class of AI models that can adjust the amount of computation and time they use during inferencing (test-time compute) to generate better answers. For example, 20 seconds of reasoning was found to deliver the same improvements that would have required 100,000 times more parameters trained for 100,000 more time.1

OpenAI has very good documentation on the strengths and weaknesses of reasoning models compared to traditional system 1 transformers.

OpenAI o1

OpenAI’s o1 models are the successful combination of reinforcement learning and scaling unsupervised learning with transformers.2 o1 creates its own chains of thought in the form of reasoning tokens and follows them instead of training on a human-supplied chain of thought. Internally, these chains of thought involve the model questioning itself as it comes to a solution, and the model is aware of the constraints placed on its ability to think (e.g., limited time, as enforced through a budget of reasoning tokens).

DeepSeek-R1

DeepSeek-R1 is an 671-billion parameter, open-source reasoning model. It introduced methods of fine-tuning any transformer into becoming a reasoning model using modest resources.

Future

Current reasoning models can spend minutes reasoning, but future reasoning models will be allowed to think for “months or years” to improve the quality of their responses.2

Reasoning models will be able to plan and self-correct, enabling these models to contribute to their own development and improvement.2

Because reasoning allows models to work around problems it encounters on the path to find novel solutions. This forms a strong foundation for superintelligence.

Footnotes

  1. OpenAI scientist Noam Brown stuns TED AI Conference: ’20 seconds of thinking worth 100,000x more data’

  2. Building OpenAI o1 (Extended Cut) 2 3