It’s hard to keep track of Microsoft’s public disclosures about its AI supercomputers, so here is what has been stated publicly.

2019 supercomputers

OpenAI’s supercomputer (V100)

Microsoft disclosed that it built a supercomputer with 10,000 NVIDIA V100 GPUs that would’ve listed at #5 on Top500 had they submitted an HPL score.1 This supercomputer was used to train GPT-3.2

Kevin Scott qualitatively described this system as “as big as a shark:”2

Kevin Scott, Microsoft Build 2024:

In 2020, we built our first Al supercomputer for OpenAI. It’s the supercomputing environment that trained GPT-3. And so, we’re going to just choose “marine wildlife” as our scale marker.

You can think of that system as about as big as a shark.

Mark Russinovich said the following about it:3

Mark Russinovich, Microsoft Build 2024:

The first AI supercomputer we built back in 2019 for OpenAI was used to train GPT-3 and the size of that infrastructure, if we’d submitted it to the TOP500 supercomputer benchmark, would have been in the top five supercomputers in the world on-premises or in the Cloud and the largest in the Cloud.

2021 supercomputers

Voyager (A100)

Microsoft debuted the Voyager supercomputer at #10 on the November 2021 Top500 list. HPL was run on 264 nodes of Azure ND A100 v4.

2022 supercomputers

OpenAI’s 2022 supercomputer

Very little has been said about this computer other than it is “as big as an orca”2 and that it was built to train GPT-4:3

Mark Russinovich, Microsoft Build 2024:

In November, we talked about the two generations later supercomputer, so that we built one to train GPT-4 after the GPT-3 one…

Kevin Scott also referred to the supercomputer that trained GPT-4 at Build 2024:2

Kevin Scott, Microsoft Build 2024:

The next system that we built, scale wise, is about as big as an orca. And that is the system that we delivered in 2022 that trains GPT-4.

Iowa Supercomputer (2022-2023)

In September 2023, Microsoft disclosed “the Iowa supercomputer” which is “among the largest and most powerful in the world.”4 It is unclear from the article when this supercomputer was deployed.

2023 supercomputers

OpenAI’s 2023 supercomputer

Mark Russinovich said the following in his talk at Build 2024:3

Mark Russinovich, Microsoft Build 2024:

In November, we talked about the two generations later supercomputer, so that we built one to train GPT-4 after the GPT-3 one and this one we’re building out to train the next generation of OpenAI’s models this one with a slice of that infrastructure, just a piece of that supercomputer that we were bringing online.

Kevin Scott also referred to a supercomputer being built out at Build 2024:2

Kevin Scott, Microsoft Build 2024

The system that we have just deployed is, scale wise, about as big as a whale relative to the shark-sized supercomputer and this orca-sized supercomputer. And it turns out you can build a whole hell of a lot of Al with a whale-sized supercomputer.

Explorer (MI200)

In June 2023, Microsoft debuted Explorer, as the #11 supercomputer on Top500. It ran HPL on 480 nodes of Azure ND MI200 v4.

Eagle (H100)

In November 2023, Microsoft also debuted Eagle as the #3 supercomputer on Top500, a “tiny fraction”3 of a much larger supercomputer that ran HPL on 1,800 Azure ND H100 v5 nodes. See Eagle for more information.

Global infrastructure in 2024

Mark Russinovich also provided the following timeline of growth:3

  • May 2020: 10,000 V100 GPUs, #5 supercomputer (not confirmed)
  • Nov 2023: 14,400 H100 GPUs, #3 supercomputer (confirmed)

30x supercomputers comment

We’ve built 30x the size of our AI infrastructure since November, the equivalent of five of those supercomputers I showed you every single month

The math is 5x Eagles 6 months between November 2023 and May 2024 to get 30x.

InfiniBand domain size

If you take a look at the scale of those systems I was talking about, those AI supercomputers, there are tens of thousands of servers, and they all have to be connected together to make that efficient all-reduce happen. …the InfiniBand domain covers the entire supercomputer, which is tens of thousands of servers.

Standard cluster size

In the case of our public systems where we don’t have customers that are looking for training at that scale … the InfiniBand domains are 1,000 - 2,000 servers in size, which is still 10,000 - 20,000 GPUs.

The above statements about typical cluster sizes are incorrect. Mark probably meant to say that the InfiniBand domains are 1K-2K GPUs, not servers. Making GPU clusters with 10K-20K GPUs incurs additional cost of adding a third layer of InfiniBand which only has value to training workloads; inferencing simply does not need that many interconnected GPUs. Microsoft does deploy training systems like Eagle with three-layer InfiniBand fabrics, but the standard Azure GPU cluster is not designed for training.

2025 supercomputers

Fairwater (Wisconsin)

Fairwater is a datacenter facility and associated AI supercomputer developed by Microsoft. The first publicized Fairwater site is in Mount Pleasant, WI.

The supercomputer itself has5

  • “hundreds of thousands of interconnected NVIDIA GB200 GPUs”
  • “exabytes of storage and literally millions of CPU compute cores”

See Fairwater for more details.

Footnotes

  1. Microsoft announces new supercomputer, lays out vision for future AI work - Source

  2. Microsoft Build opening keynote video and Microsoft Build opening keynote transcript 2 3 4 5

  3. Inside Microsoft AI innovation with Mark Russinovich 2 3 4 5

  4. How a small city in Iowa became an epicenter for advancing AI - Source

  5. Scott Guthrie’s keynote at Microsoft Build 2025 - Unpacking the tech