Horizon is TACC’s leadership supercomputer being built out of GB200 GPU nodes, Vera CPU nodes, 400/800G InfiniBand, and VAST for storage. It is being integrated by Dell using their IR7000 platform.

Horizon is sponsored by the NSF as the supercomputing foundation for NSF LCCF, which was the follow-on to NSF’s Blue Waters program. Horizon is also TACC’s follow-on to the Frontera system (a phase 1 system) and will provide 10x its performance.1

It will be sited at an 85 MW colocation facility in Round Rock, TX operated by Sabey Data Centers.2

System overview

The following information is from a presentation given by Dan Stanzione at SC25:

  • 2,016 GB200 NVL4 GPU nodes
    • 4,032 B200 GPUs
    • 300 PF FP64 (Rmax)
    • probably 320 PF FP64 Rpeak
  • 4,752 Vera CPU-only nodes
    • 836,352 CPU cores
  • “roughly” 400 PB usable capacity (VAST)
    • 8 TB/s write
    • 16 TB/s read
    • InfiniBand-connected

Physically, this is arranged as:

  • 13 MW
  • 28 GPU racks
    • 144 GPUs each (72 nodes each)
    • 800G InfiniBand (nonblocking)
    • 215 kW each
  • 66 CPU racks
    • 72 nodes per rack
    • 88-core Vera CPUs
    • 400G InfiniBand
    • > 100 kW each
  • 34 “storage, switching, and management” racks

The presentation said that there will be 2,016 GPU nodes and 4,032 GPUs, suggesting that these GB200 NVL4 blades will be presented as node pairs.

Footnotes

  1. LCCF (utexas.edu)

  2. TACC Selects Sabey Data Centers as Colo for Horizon HPC System