Groq makes dataflow accelerators that rely exclusively on SRAM. NVIDIA recently acquihired the company, and at GTC26, positioned the Groq accelerator for accelerating two specific workloads:
- FFNs in MOE transformers, which can be combined with expert parallelism to fit neatly into the LPU SRAM
- Draft models when using speculative decode
According to the Wall Street Journal, Groq’s success is a direct result of OpenAI Codex latencies being too high as a result of NVIDIA GPUs’ focus on throughput:1
OpenAI’s coding tool, Codex, became one of the startup’s buzziest products. Yet engineers at the ChatGPT-maker were running into a problem: Nvidia’s chips weren’t powering the product quickly enough, frustrating users with long wait times.
OpenAI was one of Nvidia’s largest customers. But by the time its president Greg Brockman arrived at the opera reception with his wife, Anna, he was already looking elsewhere. The startup was about to sign a deal to use chips designed by Cerebras, which are faster and more efficient than Nvidia GPUs in some instances.
It went on to say that OpenAI will be using Groq LPUs:
OpenAI is set to be one of the first customers of the chip, The Wall Street Journal reported.