LPU High Growth Shipments on the Horizon? Nvidia Releases Vera Rubin Platform with 256 Units per Cabinet

robot
Abstract generation in progress

At the GTC 2026 keynote speech, a new chip called NVIDIA Groq 3 LPU was officially unveiled.

On Tuesday early morning Beijing time, NVIDIA announced the Vera Rubin platform, which includes a total of seven chips, including the Groq 3 LPU (short for LPU), as well as Vera CPU, Rubin GPU, NVLink 6 switch, ConnectX-9 super network card, BlueField-4 DPU, and Spectrum-6 Ethernet switch.

It is reported that NVIDIA will build a Groq 3 LPX rack containing 256 LPUs, offering 128GB (each LPU integrated with 500MB of SRAM) of SRAM and 40 PB/s inference acceleration bandwidth, connected via a dedicated expansion interface with 640 TB/s per rack. This rack will be part of the complete AI supercomputing platform Vera Rubin, alongside four other racks including Vera Rubin NVL72 and Vera CPU.

NVIDIA stated that Groq 3 LPX is the inference accelerator for Vera Rubin, designed to meet the low latency and large context requirements of intelligent systems. Vera Rubin and LPX are designed collaboratively, combining the excellent performance of Rubin GPU and LPU, resulting in extremely low latency and ultra-high throughput.

Jensen Huang introduced that, when combined with the Vera Rubin platform, the inference throughput/watt ratio of LPX could be improved by 35 times. The LPU chips will be manufactured by Samsung, and the racks are expected to start shipping in the second half of this year.

Yesterday, analyst郭明錤 (Ming-Chi Kuo) stated that after NVIDIA’s investment in Groq, the shipment forecast for LPU has been significantly upgraded. The total shipments are expected to reach 4 to 5 million units from 2026 to 2027. The new architecture racks are expected to begin mass production in Q4 this year, with shipments of approximately 300 to 500 units in 2026 and 15,000 to 20,000 units in 2027.

He believes that the rapid growth in LPU demand is mainly driven by external factors. On one hand, LPU is highly integrated with the NVIDIA ecosystem (such as CUDA), greatly lowering the barriers for application development and deployment. On the other hand, industry demand for ultra-low latency inference is rapidly increasing, including AI agents, real-time processing, consumer-facing applications, and physical AI.

Notably, Jensen Huang also emphasized during the keynote that AI has completed a key transition from perception intelligence to generative intelligence, and now to physical and agent intelligence.

Caitong Securities pointed out that large models have latency during inference, which is closely related to user experience. The latency mainly occurs in the decode stage, with the core bottleneck being memory bandwidth. LPU, with faster memory bandwidth, can reduce the latency during large model inference. Additionally, large models based on LPU not only have faster inference speeds but also offer more cost-effective pricing, further enhancing user experience.

The firm stated that the consumption of tokens has increased significantly, driving high growth in the inference chip market. LPU is expected to gradually penetrate the inference chip market, which has high growth potential. They are optimistic about LPU’s high growth prospects and the PCB opportunities brought by rack shipments, recommending attention to: Zhimi Zhineng (a stake in Yuan Chuanwei), Xingchen Technology (multiple rounds of funding in Yuan Chuanwei), Huidian Shares (NVIDIA PCB supplier), Shenghong Technology (NVIDIA PCB supplier), and Shennan Circuit.

(Source: Cailian Press)

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments