Glenn's Digital Garden

❯

❯

OpenAI

Apr 16, 2026

organization

This page serves as a locus for everything related to OpenAI.

Software stack

OpenAI has disclosed the following about their training software stack:

They have used Ray for training GPT 3.5 and GPT 4.0.¹ It is unclear if they have used it for training since then, or if they use it for inferencing at all.
They have used Kubernetes on their large training clusters.² See Kubernetes.
They have used Apache Spark for data preprocessing. This was mentioned in the GPT-3 paper.

For inferencing, their stack appears to include:

Cosmos DB for conversation state (see ChatGPT)
Codex’s web UI uses Temporal to store workflow state³

In addition, they have disclosed:

They use a private monorepo for their code. This was stated in some video they posted about testing with data that they know wasn’t in the training dataset.
Their observability platform is built on ClickHouse,⁴ Fluent Bit, and Azure Blob⁵

Training techniques

They used multicluster training for GPT-4.5.

Infrastructure

See Microsoft supercomputers and Stargate.

Business

See OpenAI x Microsoft.

Footnotes

https://www.anyscale.com/glossary/what-is-ray ↩
https://openai.com/index/scaling-kubernetes-to-7500-nodes/ ↩
Of course you can build dynamic AI agents with Temporal | Temporal ↩
What is observability in 2026? Why it’s an analytics problem and why your database matters. | Engineering | ClickHouse Resource Hub ↩
Why OpenAI chose ClickHouse for petabyte-scale observability ↩

Graph View

Software stack
Training techniques
Infrastructure
Business

Backlinks

AI datacenters
ClickHouse
HPC vs AI
NAIRR
OpenAI x Microsoft
Ray
Stargate Project
Nscale
Sam Altman
Thinking Machines Lab
Home
multicluster training
Reasoning models
Superintelligence
Sustainability in HPC
vision transformer

Created with Quartz v4.5.2 © 2026

glennklockwood.com
@glennklockwood.com