Isambard-AI1 is a 1,362-node, GH200-based Cray EX supercomputer sited at Bristol as the flagship system for the British sovereign AI effort.

It debuted on the Top500 list at #11 at ISC25 with using 1,260 nodes (92.5% of the nodes), resulting in 171.8 TF/node, or 43.0 TF/H200. The run took 58 minutes.

System overview

The system is built on the Cray EX platform with Cray EX254n blades. As such, it has two GH200 nodes per blade, and each node has:

  • 4x GH200 superchips, each with
    • 1 Grace CPU (72 cores, Arm Neoverse V2)
    • 1 Hopper H200 GPU
  • 128 GB LPDDR5X DRAM
  • 4x Slingshot-11 NICs

The full system is comprised of1

  • 12x cabinets of Cray EX4000
    • 1,320 nodes (660 blades) total
    • 55 blades per cabinet (of 64 slots total)
  • 1x cabinet of Cray EX2500
    • 42 nodes (21 blades) total

The EX2500 cabinet was the phase 1 system.

Network architecture

Being a Cray EX system, Isambard-AI uses 200G Slingshot as its interconnect.

Storage subsystem

Isambard-AI has two all-flash storage subsystems:1

  • a 20.3 PiB Cray ClusterStor E1000 (Lustre) file system
    • 44x E1000 enclosures (so 88 OSSes?)
    • 24x30.72 TB NVMe SSDs per E1000 enclosure
    • 1,980 GB/s writes, 2,500 GB/s reads
    • 35 MIOPS read, 3.7 MIOPS write
  • a 3.56 PiB VAST storage system
    • 4x CBoxes, 16 CNodes
    • 3x DBoxes

History

The history of the Isambard-AI system is detailed in Isambard-AI: a leadership-class supercomputer optimised specifically for artificial intelligence,1 presented at the 2024 Cray User Group.

Based on the HPE Cray EX4000 system, and housed in a new, energy efficient Modular Data Centre in Bristol, UK, Isambard-AI employs 5,448 NVIDIA Grace-Hopper GPUs to deliver over 21 ExaFLOP/s of 8-bit floating point performance for LLM training, and over 250 PetaFLOP/s of 64-bit performance, for under 5MW. Isambard-AI integrates two, all-flash storage systems: a 20 PiByte Cray ClusterStor and a 3.5 PiByte VAST solution.

Footnotes

  1. [2410.11199] Isambard-AI: a leadership class supercomputer optimised specifically for Artificial Intelligence 2 3 4