FugakuNEXT is the codename for the follow-on flagship supercomputer for Japan after Fugaku. RIKEN is the development lead for the system, Fujitsu holds the design contract,1 and NVIDIA will supply the GPUs.2 It is slated to be deployed in Kobe in 2029 and enter operations in 2030.
- Over 3,400 nodes3 and 15K GPUs4
- 2.6 EFLOPS FP642
- 20x application speedup over Fugaku and 60x over K computer
- 600 EFLOPS FP8 with 2:1 sparsity2
- Less than 40 MW5 (target 30 MW)4
System architecture
Per the initial announcement,4
- CPU: Fujitsu Monaka-X (ARM with SME and a possible NPU)
- GPU: NVIDIA “next-gen” GPU
- Host-Device: TBD coherent interconnect
- Scale-up: NVLink within nodes and possibly between nodes
- Scale-out: Custom high-speed interconnect
It will also have a colocated IBM quantum system with a 156-qubit IBM Heron attached to the scale-out fabric.
Performance targets
Satoshi Matsuoka presented the following targets for simulation workloads at Salishan 2025:5
- Raw hardware performance gain: 10x - 20x
- Mixed precision or emulation: 2x - 8x
- Surrogates / PINN: 10x - 25x
- Total: 200x - 1000x or more over Fugaku (“Zettascale”)
The system performance requirements for the RFP3 and reinforced in the initial NVIDIA announcement4 are:
Metric | CPU | GPU |
---|---|---|
FP64 vector | 48 PF | 3,000 PF |
FP16/BF16 matrix | 1,500 PF | 150,000 PF |
FP8 matrix | 3,000 PF | 300,000 PF |
FP8 matrix, 2:1 sparse | 600,000 PF | |
Memory capacity | 10 PiB | 10 PiB |
Memory bandwidth | 8 PB/s | 800 PB/s |
The storage subsystem will be two tiers:3
Tier | Architecture | Implementation | Bandwidth | IOPS | Capacity |
---|---|---|---|---|---|
First | Near-node local | Something like CHFS, BeeOND | Write memory in less than 1 minute | Open/close/stat file per process in under 1 second | 2x memory |
Second | Shared | Lustre, DAOS | 20% of first tier | 10% of first tier | 30x memory |
Timeline
The project timeline is as follows:3
After the announcement of NVIDIA being selected as the GPU provider,4
- 2025: Phase-1 testbed (~200 GPUs)
- 2026: Phase-2 testbed (~2000 GPUs)
- “more than” 400 GB200 superchips (inferred from the claims of 40 TF FP64/GPU)
- XDR InfiniBand (nonstandard with B200)
- Liquid-cooled Supermicro servers
- Photo of a GB200 NVL72 suggests this is NVL72
- “Quantum HPC Collaboration Platform” will be merged in as well to provide ~400 more GPUs
- 2027: Phase-3 testbed with Fugaku-NEXT-like architecture (?? GPUs)
- 2030: Full Fugaku-NEXT with 15K GPUs
Early vision
The details of the project were summarized on a digital poster at SC24:
Satoshi Matsuoka has been talking about their vision for FugakuNEXT since around 2022. The vision for its CPU is:6
Themes that may be relevant to a processor or node include:63
- 3D stacking of memory and logic (as depicted above)
- Silicon photonics
- Large SRAMs, a la AMD 3D VCache
- Specialized tensor core-like data paths for scientific motifs like stencils, convolution, FFTs
- CGRA instead of or in addition to SIMD
- Processing-in-memory (PIM)
The CGRA is called out as a “strong scaling accelerator” candidate, so perhaps the CPU socket will have tiles of general-purpose CPU cores as well as CGRA tiles.