LineShine is an all-CPU exascale supercomputer at the National Supercomputing Center in Shenzhen (NSCC-SZ), built entirely from domestically produced Chinese hardware with no reliance on foreign chips.
See Tadashi Ogawa’s thread for authoritative references.
System Overview
| Site | NSCC-SZ, Shenzhen, China |
| Peak performance | ~2 EFLOPS FP64 (claimed)1 |
| Node count | 20,480 |
| Processor | LX2 (ARMv9) |
| Interconnect | LingQi (dual-plane multi-rail fat-tree) |
| Bandwidth/node | 1.6 Tb/s |
| Storage bandwidth | 10 TB/s |
| OS | Anolis OS 8.9 |
Compute Nodes
Each node has two LX2 sockets. The LX2 is an ARMv9 processor with an unusual memory topology: two compute dies per socket, each die with four NUMA domains and on-package HBM alongside off-package DDR.
Per LX2 socket:
- 2 compute dies × 152 cores = 304 cores total
- 8 HBM stacks (on-package): 32 GB, ~4 TB/s aggregate bandwidth
- Off-package DDR: 128 GB per die / 256 GB per socket
- Dedicated SDMA engine per die for DDR↔HBM movement
- Peak: 60.3 TFLOPS FP64 / 120.6 TFLOPS FP32 via SME and SVE units; FP16 and INT8 also supported
Per node (2x LX2):
- 608 cores
- 64 GB HBM + 512 GB DDR
- ~120.6 TFLOPS FP64 peak
At 20,480 nodes this yields ~2.47 EFLOPS FP64 theoretical peak, consistent with the stated 2+ EFLOPS claim.
Interconnect
The LingQi network uses a dual-plane multi-rail fat-tree at 1.6 Tb/s per node. The full deployment targets 36 network cabinets.
Storage
- 428 storage nodes across 67 cabinets
- 10 TB/s aggregate bandwidth
- Liquid-cooled; described as China’s largest liquid-cooled storage deployment
Software Stack
Runs Anolis OS 8.9 (Alibaba’s RHEL-compatible distro) with a ROCm-compatible environment plus GCC 8.5.0, rocBLAS, and PyTorch 2.7.1. The application paper describes a software-defined asynchronous MPI runtime to compensate for PyTorch’s CPU backend lacking CUDA stream semantics.2
Notes
- CPU-only: Positioned as a deliberate alternative to GPU-dominated Western systems. Workloads highlighted include molecular simulation, CFD, materials design, and LLM training.
- Domestic stack: LX2 processor, LingQi network, and storage are all Chinese-designed; explicitly framed as a response to US export controls on advanced chips.
- HBM topology is unusual. The per-NUMA-domain HBM (4 GB/domain, 16 GB/die) with SDMA-mediated DDR↔HBM movement resembles the MI300A APU design more than a conventional CPU+HBM scheme.
- Phase 1 was 100 Huawei Kunpeng servers (12,800 cores); the 20,480-node system described in the paper is a later phase.3