NVIDIA Rubin CPX

Rubin CPX was a GPU announced at the 2025 AI Infrastructure Summit “specifically optimized for context processing” and then was cancelled by GTC26. Groq has replaced it.

It swapped out HBM for GDDR7 and increased the NVFP4 performance over Rubin by 20%. From the announcement, it had:

30 PF NVFP4 (assume sparse)
“3x Exponent Operations” - which are “attention acceleration cores” - compared to
128 GB GDDR7 instead of HBM - because prefill is compute-limited, not memory bandwidth limited like decode.
4 NVENC and 4 NVDEC encoders/decoders for processing and generating AI video

It was to be released at the end of 2026 as a fast follow-on to the R200 launch.

Platforms

The following slide from Ian Buck summarized the two ways in which NVIDIA was going to ship CPX:¹

“Vera Rubin NVL144 CPX”

It was going to be incorporated into a new VR200 NVL144 tray (“VR NVL 144”) which incorporates 8x CPX GPUs in addition to the 8x Rubin GPUs in each tray:

Feature	VR144-only	VR144 with CPX
NVFP4 FLOPS	3.6 EF	8.0 EF
Memory Bandwidth	1.4 PB/s	1.7 PB/s
”Fast memory”	75 TB	150 TB
Network	8x ConnectX-9	8x ConnectX-9

”Vera Rubin CPX Dual Rack”

NVIDIA was also going to make a CPX-only tray (VR-CPX) with 8x CPX. From the slide above, it didn’t look like the CPX trays had scale-up NVLink connectivity. This implied that the non-CPX nodes will have to fetch KV caches from CPX nodes over InfiniBand.

Target applications

Ian suggested the use case would be:

Perform prefill on CPX nodes in one part of a datacenter.
As soon as the first token is ready, ship all keys and values for the prompt to a non-CPX node (presumably via InfiniBand, as these CPX nodes are not on the same NVLink domain as the HBM nodes)
HBM GPUs begin decode using the computed key and value vectors.

Industries

NVIDIA boasted a few AI companies as launch partners:

Code generation: Cursor, Magic
Inferencing-as-a-Service platforms: Fireworks AI, and together.ai are trial customers, citing the need for huge context windows (1M-100M tokens) to ingest entire codebases for code generation applications.
Creative/media generation: Runway

https://www.nvidia.com/en-us/events/ai-infra-summit/ ↩

Glenn's Digital Garden

Explorer

NVIDIA Rubin CPX

Platforms

“Vera Rubin NVL144 CPX”

”Vera Rubin CPX Dual Rack”

Target applications

Industries

Graph View

Table of Contents

Backlinks

Glenn's Digital Garden

Explorer

NVIDIA Rubin CPX

Platforms

“Vera Rubin NVL144 CPX”

”Vera Rubin CPX Dual Rack”

Target applications

Industries

Footnotes

Graph View

Table of Contents

Backlinks