Project Ceiba is the next flagship GPU cluster that AWS is building for NVIDIA using GB200.
Specifications
It will have:
Performance
The website also claims this supercomputer will provide “414 exaflops of AI.”1 This claim is only true if each B200 GPU provides 20 PF, which is the sparse FP4 performance rating of B200. Thus, “414 exaflops of AI” is only true for inferencing at FP4 for a model that has been fine-tuned for structured sparsity.
Networking
The official website refers to each B200 accelerator as a “superchip” in some parts and implies that a 4-way GB200 board is a “superchip” in other parts.2 Because of this confusing nomenclature, it is unclear what the claim of “1,600 Gbps per superchip” of network bandwidth actually means. It is likely that each GPU will have 400G NICs though, since GB200 reference designs are being paired with either 400G ConnectX-7 or 800G ConnectX-8 adapters.
Footnotes
-
Project Ceiba – Largest AI Super Computer Co-Built with NVIDIA - AWS ↩
-
“Project Ceiba’s configuration includes 20,736 NVIDIA GB200 Grace Blackwell Superchips.” and “scales to 20,736 Blackwell GPUs connected to 10,368 NVIDIA Grace CPUs” as stated on Project Ceiba – Largest AI Super Computer Co-Built with NVIDIA - AWS are contradictory statements. ↩