Rubin CPX was a GPU announced at the 2025 AI Infrastructure Summit “specifically optimized for context processing” and then was cancelled by GTC26. Groq has replaced it.

It swapped out HBM for GDDR7 and increased the NVFP4 performance over Rubin by 20%. From the announcement, it had:

  • 30 PF NVFP4 (assume sparse)
  • “3x Exponent Operations” - which are “attention acceleration cores” - compared to
  • 128 GB GDDR7 instead of HBM - because prefill is compute-limited, not memory bandwidth limited like decode.
  • 4 NVENC and 4 NVDEC encoders/decoders for processing and generating AI video

It was to be released at the end of 2026 as a fast follow-on to the R200 launch.

Platforms

The following slide from Ian Buck summarized the two ways in which NVIDIA was going to ship CPX:1

“Vera Rubin NVL144 CPX”

It was going to be incorporated into a new VR200 NVL144 tray (“VR NVL 144”) which incorporates 8x CPX GPUs in addition to the 8x Rubin GPUs in each tray:

FeatureVR144-onlyVR144 with CPX
NVFP4 FLOPS3.6 EF8.0 EF
Memory Bandwidth1.4 PB/s1.7 PB/s
”Fast memory”75 TB150 TB
Network8x ConnectX-98x ConnectX-9

”Vera Rubin CPX Dual Rack”

NVIDIA was also going to make a CPX-only tray (VR-CPX) with 8x CPX. From the slide above, it didn’t look like the CPX trays had scale-up NVLink connectivity. This implied that the non-CPX nodes will have to fetch KV caches from CPX nodes over InfiniBand.

Target applications

Ian suggested the use case would be:

  1. Perform prefill on CPX nodes in one part of a datacenter.
  2. As soon as the first token is ready, ship all keys and values for the prompt to a non-CPX node (presumably via InfiniBand, as these CPX nodes are not on the same NVLink domain as the HBM nodes)
  3. HBM GPUs begin decode using the computed key and value vectors.

Industries

NVIDIA boasted a few AI companies as launch partners:

  • Code generation: Cursor, Magic
  • Inferencing-as-a-Service platforms: Fireworks AI, and together.ai are trial customers, citing the need for huge context windows (1M-100M tokens) to ingest entire codebases for code generation applications.
  • Creative/media generation: Runway

Footnotes

  1. https://www.nvidia.com/en-us/events/ai-infra-summit/