FugakuNEXT is the codename for the follow-on flagship supercomputer for Japan after Fugaku. It is slated to be deployed in 2029 for operations in 2030, and RIKEN is the development lead for the system. Its high-level goals include:
- 5x to 10x improvement in HPC application performance over Fugaku
- more than 50 EFLOPS for AI training (100-200 EFLOPS peak)
- 50x-100x application speedup when using AI surrogates
The system performance requirement for the RFP is:1
Metric | CPU | GPU |
---|---|---|
FP64 vector | 48 PF | 3,000 PF |
FP16/BF16 matrix | 1,500 PF | 150,000 PF |
FP8 matrix | 3,000 PF | 300,000 PF |
Memory capacity | 10 PiB | 10 PiB |
Memory bandwidth | 8 PB/s | 800 PB/s |
In addition, they expect: |
- Over 3,400 nodes
- 2:1 structured sparsity
- Less than 40 MW
The storage subsystem will be two tiers:1
Tier | Architecture | Implementation | Bandwidth | IOPS | Capacity |
---|---|---|---|---|---|
First | Near-node local | Something like CHFS, BeeOND | Write memory in less than 1 minute | Open/close/stat file per process in under 1 second | 2x memory |
Second | Shared | Lustre, DAOS | 20% of first tier | 10% of first tier | 30x memory |
The project timeline is as follows:1
The details of the project were summarized on a digital poster at SC24:
Satoshi Matsuoka has been talking about their vision for FugakuNEXT since around 2022. The vision for its CPU is:2
Themes that may be relevant to a processor or node include:21
- 3D stacking of memory and logic (as depicted above)
- Silicon photonics
- Large SRAMs, a la AMD 3D VCache
- Specialized tensor core-like data paths for scientific motifs like stencils, convolution, FFTs
- CGRA instead of or in addition to SIMD
- Processing-in-memory (PIM)
The CGRA is called out as a “strong scaling accelerator” candidate, so perhaps the CPU socket will have tiles of general-purpose CPU cores as well as CGRA tiles.