Ray is a framework developed by Ion Stoica’s group that provides abstractions that simplify scaling up training and inferencing.

Prominent users

  • OpenAI famously used Ray Train to train GPT-3.5 and GPT-4.1
  • Thinking Machines Lab appears to use Ray for its data engineering.2 Ray is what sits behind TML’s Tinker API to spawn RL trainer and sampler GPU clusters.3
  • Microsoft AI uses Ray for4
    • training, which enables “fast recovery through in-job restarts”
    • reinforcement learning within their Rocket architecture

Footnotes

  1. See What is Ray?. Greg Brockman also stated that Ray was used to train OpenAI’s largest models at the 2022 Ray Summit; that clip can be viewed here: https://www.youtube.com/clip/UgkxBHfDgDA-IThBcmkOSUhDqF7FgLUdwB2V

  2. https://job-boards.greenhouse.io/thinkingmachines/jobs/5013919008

  3. Ray Summit 2025 Keynote: The Shift to LLM Fine-Tuning with Thinking Machines

  4. MAI-Thinking-1: Building a Hill-Climbing Machine