Jensen Huang GTC Speech Full Text: The Age of Inference Has Arrived, 2027 Revenue at Least Trillion Dollars, Robotics is the New Operating System

CycleProphet · 2026-03-17T01:12:05+00:00

Nvidia is developing "Vera Rubin Space-1," a datacenter compute system deployed in space, completely opening up the imaginative space for AI computing capacity to extend beyond Earth.Source: Wall Street JournalOn March 16, 2026, Nvidia GTC 2026 conference officially opened, with Nvidia founder and CEO Jensen Huang delivering the keynote address.At this conference, regarded as the "annual pilgrimage of the AI industry," Huang articulated Nvidia's transformation from a "chip company" to an "AI infrastructure and factory company." Addressing the market's greatest concerns about sustained performance and growth potential, Huang detailed the underlying business logic driving future growth — "Token Factory Economics."Performance Guidance Extremely Optimistic, "At Least 1 Trillion Dollars in Demand from 2027 Onwards"

CycleProphet

2026-03-17 01:12:05

NVIDIA is developing and deploying space-based data center computers called “Vera Rubin Space-1,” fully opening up the imagination of extending AI computing power beyond Earth.

Source: Wall Street Insights

On March 16, 2026, NVIDIA’s GTC 2026 conference officially opened, with founder and CEO Jensen Huang delivering the keynote speech.

At this event, regarded as the “AI industry’s annual pilgrimage,” Huang explained NVIDIA’s transformation from a “chip company” to an “AI infrastructure and factory company.” Confronted with market concerns about sustained performance and growth potential, Huang detailed the underlying business logic driving future expansion—“Token Factory Economics.”

Performance guidance is extremely optimistic: “Demand of at least $1 trillion by 2027.”

Over the past two years, global AI computing demand has exploded exponentially. As large models evolve from “perception” and “generation” to “reasoning” and “action (task execution),” the consumption of computing power has surged sharply. Addressing market concerns about order and revenue ceilings, Huang provided very strong expectations.

In his speech, Huang openly stated:

Last year, I mentioned we saw a high-confidence demand of $500 billion, covering Blackwell and Rubin through 2026. Now, right here, I see at least $1 trillion in demand by 2027.

Huang’s trillion-dollar forecast once drove NVIDIA’s stock price up by over 4.3%.

Moreover, he added:

Is this reasonable? That’s what I’m about to discuss. In fact, we might even be undersupplied. I am certain that actual computing demand will be much higher than this.

Huang pointed out that NVIDIA’s current systems have proven themselves to be the world’s “lowest-cost infrastructure.” Because NVIDIA can run nearly all AI models across various fields, this versatility allows the $1 trillion investment from customers to be fully utilized and maintain a long lifecycle.

Currently, 60% of NVIDIA’s business comes from the top five hyperscale cloud providers, while the remaining 40% is widely distributed across sovereign clouds, enterprises, industrial sectors, robotics, and edge computing.

Token Factory Economics: Performance per Watt Determines Business Vitality

To explain the reasonableness of this $1 trillion demand, Huang presented a new business mindset to global CEOs. He pointed out that future data centers will no longer be file storage warehouses but “Token factories”—production lines for AI-generated fundamental units.

Huang emphasized:

Every data center, every factory, by definition, is limited by power. A 1GW (gigawatt) factory will never become a 2GW one—that’s a physical and atomic law. Under fixed power, whoever’s performance per watt for Token throughput is highest will have the lowest production cost.

Huang divided future AI services into four business tiers:

Free Tier (high throughput, low speed)
Mid-tier (~$3 per million tokens)
Premium Tier (~$6 per million tokens)
High-Speed Tier (~$45 per million tokens)
Ultra-High-Speed Tier (~$150 per million tokens)

He pointed out that as models grow larger and context lengths increase, AI becomes smarter, but token generation speed decreases. Huang stated:

In this Token Factory, your throughput and token generation speed will directly translate into your precise revenue next year.

Huang emphasized that NVIDIA’s architecture enables customers to achieve extremely high throughput at the free tier, while at the highest inference tier, performance can be improved by an astonishing 35 times.

Vera Rubin achieved 350x acceleration in two years, with Groq filling the gap for ultra-fast inference

Under these physical limits, NVIDIA introduced its most complex AI computing system ever, Vera Rubin. Huang said:

Last year, I mentioned Hopper, and I would hold up a chip—very cute. But when I mention Vera Rubin, everyone thinks of the entire system. In this fully liquid-cooled system, eliminating traditional cables, racks that once took two days to install now take only two hours.

Huang pointed out that through extreme end-to-end hardware-software co-design, Vera Rubin has created astonishing data leaps within the same 1GW data center:

In just two years, we increased the token generation rate from 22 million to 700 million, a 350-fold increase. Moore’s Law during the same period only provides about a 1.5x boost.

To address bandwidth bottlenecks under ultra-fast inference conditions (e.g., 1000 tokens/sec), NVIDIA offers an integrated final solution acquired from Groq: asymmetric separation inference. Huang explained:

These two processors have very different characteristics. Groq chips have 500MB of SRAM, while a Rubin chip has 288GB of memory.

Huang noted that NVIDIA’s Dynamo software system offloads the “pre-fill” (preloading) phase, which requires massive computation and memory, to Vera Rubin, and the latency-sensitive “decode” phase to Groq. He also provided enterprise compute configuration suggestions:

If your workload is mainly high throughput, use 100% Vera Rubin; if you have substantial high-value programming-level token generation needs, allocate about 25% of your data center to Groq.

It is revealed that Samsung-processed Groq LP30 chips are already in mass production, expected to ship in Q3, and the first Vera Rubin rack is already running on Microsoft Azure cloud.

Additionally, Huang showcased the world’s first mass-produced co-packaged optical (CPO) switch Spectrum X, calming market concerns over the “copper retreat, optical advance” route:

We need more copper cable capacity, more optical chip capacity, and more CPO capacity.

Agent Ends Traditional SaaS: “Annual Salary + Token” Becomes Silicon Valley Standard

Beyond hardware barriers, Huang dedicated much of his speech to the revolution in AI software and ecosystems, especially the explosion of Agents (intelligent entities).

He described the open-source project OpenClaw as “the most popular open-source project in human history,” claiming it surpassed what Linux achieved in 30 years within just a few weeks. Huang straightforwardly said that OpenClaw is essentially the “operating system” for agent computers.

Huang asserted:

Every SaaS (Software as a Service) company will become an AaaS (Agent-as-a-Service) company. Undoubtedly, to ensure the safe deployment of these intelligent agents capable of accessing sensitive data and executing code, NVIDIA has launched the enterprise-grade NeMo Claw reference design, adding policy engines and privacy routers.

For ordinary workers, this transformation is also imminent. Huang depicted the future workplace:

In the future, every engineer in our company will have an annual token budget. Their base salary might be hundreds of thousands of dollars, and I will allocate about half of that amount as token quota, enabling them to achieve 10x efficiency improvements. This has become a new recruiting chip in Silicon Valley: how many tokens are included in your offer?

He also “leaked” that the next-generation computing architecture Feynman will enable the first co-expansion of copper and CPO. More intriguingly, NVIDIA is developing and deploying space-based data center computers called “Vera Rubin Space-1,” fully opening the imagination of extending AI compute power beyond Earth.

Huang’s full GTC 2026 speech (with AI tool assistance) is as follows:

Host: Welcome to the stage, NVIDIA founder and CEO Jensen Huang.

Jensen Huang, Founder and CEO:

Welcome to GTC. I want to remind everyone that this is a technology conference. Seeing so many people queuing early in the morning, and being here with all of you, makes me very happy.

At GTC, we focus on three main themes: technology, platform, and ecosystem. NVIDIA currently has three major platforms: the CUDA-X platform, system platform, and our latest AI factory platform.

Before we begin, I want to thank our pre-show hosts—Sarah Guo from Conviction, Alfred Lin from Sequoia Capital (NVIDIA’s first venture investor), and NVIDIA’s first major institutional investor Gavin Baker. These three have deep insights into technology and broad influence in the entire tech ecosystem. Of course, I also want to thank all the distinguished guests I personally invited to attend today. Thank you to this all-star team.

I also want to thank all the companies present today. NVIDIA is a platform company with technology, platforms, and a rich ecosystem. The companies here represent nearly all participants in the $100 trillion industry—450 companies sponsored this event, for which I am deeply grateful.

This conference features 1,000 technical forums and 2,000 speakers, covering every level of the AI “five-layer cake” architecture—from infrastructure like land, power, and data centers, to chips, platforms, models, and the various applications driving the industry forward.

CUDA: Twenty Years of Technological Accumulation

Everything starts here. This year marks the 20th anniversary of CUDA.

For twenty years, we have been committed to developing this architecture. CUDA is a revolutionary invention—SIMT (Single Instruction, Multiple Threads) technology allows developers to write scalar code and extend it to multi-threaded applications, with much lower programming difficulty than previous SIMD architectures. Recently, we added Tiles functionality to help developers more easily program Tensor Cores, and various mathematical structures essential for AI today. Currently, CUDA has thousands of tools, compilers, frameworks, and libraries, with hundreds of thousands of open projects in the open-source community, deeply integrated into every tech ecosystem.

This chart reveals NVIDIA’s 100% strategic logic, a slide I’ve been showing from the beginning. The most difficult and core element is the “Installed Base” at the bottom of the chart. Over twenty years, we have accumulated hundreds of millions of GPUs and computing systems running CUDA worldwide.

Our GPUs cover all cloud platforms, serving nearly all computer manufacturers and industries. The vast installed base of CUDA is the fundamental reason this flywheel accelerates continuously. The installed base attracts developers, who create new algorithms and breakthroughs, which in turn spawn new markets, form new ecosystems, and attract more companies, further expanding the installed base—this flywheel keeps speeding up.

NVIDIA’s software downloads are growing at an astonishing rate, large in scale and increasing rapidly. This flywheel enables our computing platform to support massive applications and continuous breakthroughs.

More importantly, it also grants these infrastructures a very long lifespan. The reason is clear: applications running on NVIDIA CUDA are extremely diverse, covering every stage of the AI lifecycle, various data processing platforms, and scientific solvers. Once installed, NVIDIA GPUs have high actual value. That’s why, even six years after releasing the Ampere architecture GPU, its cloud prices have increased.

All this is driven by the enormous installed base, a powerful flywheel, and a broad developer ecosystem. When these factors work together, along with our ongoing software updates, computing costs keep decreasing. Accelerated computing not only boosts application performance significantly but also, through long-term software maintenance and iteration, allows users to enjoy ongoing performance gains and decreasing costs. We are committed to supporting every GPU globally for the long term because of their architecture compatibility.

We do this because of the huge installed base—every time we release an optimization, it benefits millions of users. This dynamic combination continuously expands our reach, accelerates our growth, and drives down costs, ultimately fueling new growth. CUDA is at the core of all this.

From GeForce to CUDA: Twenty-Five Years of Evolution

Our journey with CUDA actually began twenty-five years ago.

GeForce—many of you grew up with GeForce. GeForce is NVIDIA’s most successful marketing project. We started cultivating future customers when you couldn’t afford our products—your parents became NVIDIA’s earliest users, buying our products year after year, until one day you grew into excellent computer scientists and true customers and developers.

This foundation was laid by GeForce twenty-five years ago. We invented programmable shaders—an obvious yet profound invention that enabled accelerators to become programmable, and the world’s first programmable accelerator, the pixel shader. Five years later, we created CUDA—one of our most important investments ever. At that time, our financial resources were limited, but we bet most of our profits on it, aiming to extend CUDA from GeForce to every computer. Our conviction was deep because we believed in its potential. Despite initial hardships, we persisted through 13 generations over twenty years, and now CUDA is everywhere.

It was the pixel shader that drove the GeForce revolution. About eight years ago, we launched RTX—a comprehensive overhaul of architecture for modern computer graphics. GeForce brought CUDA to the world, and because of that, researchers like Alex Krizhevsky, Ilya Sutskever, Geoffrey Hinton, and Andrew Ng discovered that GPUs could be powerful accelerators for deep learning, igniting the AI explosion a decade ago.

Ten years ago, we decided to fuse programmable shading with two new ideas: first, hardware ray tracing, which was technically challenging; second, a forward-looking idea—about ten years ago, we foresaw that AI would fundamentally transform computer graphics. Just as GeForce brought AI to the world, AI now is reshaping how computer graphics are realized.

Today, I want to show you the future. It’s our next-generation graphics technology, called Neural Rendering—a deep fusion of 3D graphics and AI. This is DLSS 5, please watch.

Neural Rendering: The Fusion of Structured Data and Generative AI

Isn’t this breathtaking? Computer graphics are coming alive again.

What have we done? We combined controllable 3D graphics (the real foundation of virtual worlds) with structured data, then integrated generative AI and probabilistic computing. One is fully deterministic, the other probabilistic but highly realistic—we merge these two concepts, achieving precise control through structured data while generating in real-time. The result is content that is both stunning and fully controllable.

The idea of merging structured information with generative AI will repeatedly appear across industries. Structured data is the foundation of trustworthy AI.

Accelerating Platforms for Structured and Unstructured Data

Now I will show a technical architecture diagram.

Structured data—familiar platforms like SQL, Spark, Pandas, Velox, and major cloud platforms such as Snowflake, Databricks, Amazon EMR, Azure Fabric, Google BigQuery—all handle data frames. These data frames are like giant spreadsheets, carrying all the information of the business world, the fundamental facts (Ground Truth) for enterprise computing.

In the AI era, we need AI to use structured data and achieve extreme acceleration. In the past, accelerating structured data processing was to make enterprises more efficient. In the future, AI will use these data structures at speeds far beyond humans, and AI agents will heavily invoke structured databases.

For unstructured data, the majority of data types—vector databases, PDFs, videos, audio—constitute most of the world’s data. About 90% of data generated annually is unstructured. In the past, these data were almost unusable: we read them, store them in file systems, and that’s it. We couldn’t query or retrieve easily because unstructured data lack simple indexing; understanding their meaning and context is necessary. Now, AI can do this—using multimodal perception and understanding, AI can read PDFs, grasp their meaning, and embed them into larger queryable structures.

NVIDIA has created two foundational libraries for this:

cuDF: for accelerated processing of data frames and structured data
cuVS: for vector storage, semantic data, and unstructured AI data processing

These two platforms will become some of the most important foundational platforms in the future.

Today, we announce collaborations with multiple companies. IBM—creator of SQL—will use cuDF to accelerate its WatsonX Data platform. Dell has partnered with us to build the Dell AI Data Platform, integrating cuDF and cuVS, achieving significant performance improvements in real projects with NTT Data. Google Cloud is now accelerating not only Vertex AI but also BigQuery, and has partnered with Snapchat to reduce their computing costs by nearly 80%.

The benefits of accelerated computing are threefold: speed, scale, and cost. This aligns with Moore’s Law—achieving performance leaps through acceleration while continuously optimizing algorithms, allowing everyone to enjoy steadily decreasing costs.

NVIDIA has built an accelerated computing platform that consolidates many libraries: RTX, cuDF, cuVS, and more. These libraries are integrated into global cloud services and OEM systems, reaching users worldwide.

Deep Collaboration with Cloud Providers

Partnerships with major cloud providers

Google Cloud: We accelerate Vertex AI and BigQuery, deeply integrate with JAX/XLA, and perform excellently on PyTorch—NVIDIA is the only accelerator that performs well on both PyTorch and JAX/XLA. We have onboarded clients like Base10, CrowdStrike, Puma, and Salesforce into the Google Cloud ecosystem.

AWS: We accelerate EMR, SageMaker, and Bedrock, with deep integration. This year, I am especially excited that we will bring OpenAI into AWS, significantly boosting AWS cloud consumption and helping OpenAI expand regional deployment and compute scale.

Microsoft Azure: The first supercomputer built by us, with 100 PFLOPS, is also the first supercomputer deployed on Azure, laying a foundation for collaboration with OpenAI. We accelerate Azure cloud services and AI Foundry, jointly expanding Azure regions, and work closely with Bing Search. Notably, our Confidential Computing capability—ensuring even operators cannot view user data and models—NVIDIA GPUs are among the first to support confidential computing, enabling secure deployment of OpenAI and Anthropic models across cloud regions. For example, we accelerate all Synopsys EDA and CAD workflows, deploying on Microsoft Azure.

Oracle: We are Oracle’s first AI customer, proud to be the first to explain AI cloud concepts to Oracle. Since then, Oracle has grown rapidly, and we have introduced partners like Cohere, Fireworks, and OpenAI.

CoreWeave: The world’s first AI-native cloud, born for GPU hosting and AI cloud services, with an excellent customer base and strong growth momentum.

Palantir + Dell: A tripartite collaboration creating a new AI platform based on Palantir’s Ontology Platform and AI platform, capable of deploying AI fully locally in any country, any air-gapped environment—from data processing (vectorized or structured) to the entire AI acceleration stack.

NVIDIA has established this kind of mutually beneficial ecosystem with global cloud providers—bringing customers into the cloud.

Vertical Integration and Horizontal Openness: NVIDIA’s Core Strategy

NVIDIA is the world’s first vertically integrated, horizontally open company.

The necessity of this model is simple: accelerated computing is not just about chips or systems; it’s about application acceleration. CPUs can make computers run faster overall, but this approach has reached a bottleneck. In the future, only application- or domain-specific acceleration can continue to deliver performance leaps and cost reductions.

This is why NVIDIA must deeply develop one library after another, one industry after another, one vertical sector after another. We are a vertically integrated computing company with no other path. We must understand applications, understand domains, deeply grasp algorithms, and be able to deploy them in any scenario—data centers, cloud, on-premises, edge, and even robotics.

At the same time, NVIDIA remains horizontally open, willing to integrate its technology into any partner’s platform, so that the benefits of accelerated computing can be enjoyed worldwide.

The participant structure of this GTC fully reflects this. Among attendees, the highest proportion is from the financial services industry—developers, not traders. Our ecosystem covers upstream and downstream supply chains. Whether companies are 50, 70, or 150 years old, last year was their best year ever. We are at the beginning of something very, very significant.

CUDA-X: Accelerated Computing Engines for Every Industry

In every vertical sector, NVIDIA has deep deployment:

Autonomous Driving: broad scope, profound impact
Financial Services: quantitative investing shifting from manual feature engineering to deep learning driven by supercomputers, ushering in the “Transformer era”
Healthcare: entering its own “ChatGPT moment,” covering AI-assisted drug discovery, AI-powered diagnostics, medical customer service
Industrial: a global construction wave underway, with AI factories, chip fabs, and data centers landing
Entertainment & Gaming: real-time AI platforms supporting translation, live streaming, gaming interaction, and intelligent shopping agents
Robotics: over ten years of deep cultivation, with three major computer architectures (training, simulation, onboard); 110 robots showcased at this event
Telecommunications: a $2 trillion industry, where base stations evolve from simple communication nodes to AI infrastructure platforms, with platforms like Aerial collaborating with Nokia, T-Mobile, and others

All these core areas are supported by our CUDA-X libraries—fundamental to NVIDIA as an algorithm company. These libraries are our most vital assets, enabling our computing platform to deliver real value across industries.

One of the most important libraries is cuDNN (CUDA Deep Neural Network library), which revolutionized AI and triggered the modern AI explosion.

(Playing CUDA-X demo video)

What you just saw is all simulation—including physics-based solvers, AI agent physics models, and physics AI robot models. All are simulated; no manual animation or joint binding. This is NVIDIA’s core capability: unlocking these opportunities through deep understanding of algorithms and organic integration with the computing platform.

AI Native Enterprises and the New Computing Era

You saw industry giants like Walmart, L’Oréal, JPMorgan Chase, Roche, Toyota, and many others, as well as a large number of companies you’ve never heard of—we call them AI-native enterprises. The list is enormous, including OpenAI, Anthropic, and many emerging companies serving different verticals.

In the past two years, this industry has experienced a remarkable leap. Venture capital inflows into startups reached $150 billion, a record in human history. More importantly, the size of individual investments has jumped from millions to hundreds of millions or billions of dollars. The reason is clear: for the first time in history, every such company needs massive compute resources and large tokens. This industry is creating, generating tokens, or increasing the value of tokens from institutions like Anthropic and OpenAI.

Just as the PC revolution, internet revolution, and mobile cloud revolution spawned epoch-making companies, this generation of computing platform transformation will also give rise to influential companies that will become key forces in the future world.

Three Historic Breakthroughs Driving It All

What exactly happened in the past two years? Three major events.

First: ChatGPT, ushering in the generative AI era (late 2022 to 2023)

It not only perceives and understands but also generates unique content. I showed the fusion of generative AI and computer graphics. Generative AI fundamentally changes the way we compute—shifting from retrieval-based to generation-based computing, profoundly impacting architecture, deployment, and overall significance.

Second: Reasoning AI, exemplified by o1

Reasoning enables AI to reflect, plan, and decompose problems—breaking down incomprehensible issues into manageable steps. o1 makes generative AI trustworthy, capable of reasoning based on real information. To do this, input context tokens and output tokens for reasoning increase dramatically, significantly boosting computational load.

Third: Claude Code, the first intelligent agent model

It can read files, write code, compile, test, evaluate, and iterate. Claude Code revolutionizes software engineering—NVIDIA’s 100% engineers use one or more of Claude Code, Codex, and Cursor. No software engineer works without AI assistance.

This is a new inflection point—you no longer ask AI “what, where, how,” but let it “create, execute, build,” actively using tools, reading files, decomposing problems, and taking action. AI now moves from perception to generation, reasoning, and now to actually completing work.

In the past two years, the computational demand for reasoning has increased about 10,000 times, and usage has grown about 100 times. I have always believed that the demand for compute has grown 1,000,000 times in these two years—this is a shared feeling among everyone, including OpenAI and Anthropic. More compute means more tokens generated, higher revenue, and smarter AI. The reasoning inflection point has arrived.

The Era of Trillion-Dollar AI Infrastructure

Last year at this time, I said we had high confidence in the demand and procurement orders for Blackwell and Rubin before 2026, totaling about $500 billion. Today, after a year at GTC, I stand here to tell you: looking to 2027, I see the number at least $1 trillion. And I am certain that actual compute demand will be far beyond that.

2025: NVIDIA’s Year of Inference

2025 is NVIDIA’s Year of Inference. We aim to ensure excellence at every stage of the AI lifecycle—beyond training and post-training—so that invested infrastructure can operate efficiently and have longer effective lifespans at lower unit costs.

Meanwhile, Anthropic and Meta have officially joined the NVIDIA platform, representing about one-third of global AI compute demand. Open-source models are approaching cutting-edge levels and are ubiquitous.

NVIDIA is currently the only platform capable of running all AI domains—language, biology, graphics, vision, speech, proteins, chemistry, robotics—across edge and cloud, in any language. Our architecture is universal for all these scenarios, making us the lowest-cost, highest-confidence platform.

Currently, 60% of NVIDIA’s business comes from the top five hyperscale cloud providers worldwide, with the remaining 40% spread across regional clouds, sovereign clouds, enterprises, industrial sectors, robotics, and edge computing. The breadth of AI coverage itself is its resilience—this is undoubtedly a new computing platform revolution.

Grace Blackwell and NVLink 72: Bold Architectural Innovation

While the Hopper architecture was still at its peak, we decided to completely redesign the system, expanding NVLink from 8 to 72 links, and restructuring the entire compute system. Grace Blackwell NVLink 72 is a major technological gamble, not easy for all partners—my sincere thanks to everyone involved.

At the same time, we launched NVFP4—a new type of tensor core and compute unit, not just ordinary FP4. We have demonstrated that NVFP4 can perform inference without precision loss, delivering huge performance and energy efficiency gains, and it is also suitable for training. Additionally, new algorithms like Dynamo and TensorRT-LLM have emerged, and we even built a supercomputer called DGX Cloud to optimize kernels with billions of dollars.

The results are impressive: according to Semi Analysis—the most comprehensive AI inference performance evaluation to date—NVIDIA leads in both watts per token and cost per token. While Moore’s Law might have given H200 a 1.5x boost, we achieved 35x. Dylan Patel from Semi Analysis even said, “Jensen was conservative—actually, it’s 50x.” He’s right.

I quote him: “Jensen sandbagged.”

NVIDIA’s cost per token is the lowest globally—no one can match it. The reason lies in extreme co-design.

Take Fireworks as an example: before NVIDIA’s software and algorithm updates, average token speed was about 700 per second; after updates, it approached 5,000 per second—about a 7x improvement. That’s the power of extreme co-design.

AI Factory: From Data Center to Token Factory

Data centers used to store files; now they are token-producing factories. Every cloud provider and AI company will soon use “token factory efficiency” as a core metric.

My core argument:

Vertical axis: Throughput—tokens generated per second at fixed power
Horizontal axis: Token speed—response time per inference; faster speed allows larger models, longer context, smarter AI

Tokens are the new commodity. Once mature, they will be tiered priced:

Free Tier (high throughput, low speed)
Mid-tier (~$3 per million tokens)
Premium Tier (~$6 per million tokens)
High-Speed Tier (~$45 per million tokens)
Ultra-High-Speed Tier (~$150 per million tokens)

Compared to Hopper, Grace Blackwell has increased throughput at the highest value tier by 35 times and introduced a new tier. Simplified estimates suggest that allocating 25% of power to each tier, Grace Blackwell can generate 5 times more revenue than Hopper.

Vera Rubin: The Next-Generation AI Computing System

(Playing Vera Rubin system introduction video)

Vera Rubin is a complete, end-to-end optimized system designed specifically for agent workloads:

Large language model compute core: NVLink 72 GPU cluster, handling prefill and KV cache
New Vera CPU: optimized for extremely high single-thread performance, using LPDDR5 memory, with excellent energy efficiency; the only data center CPU using LPDDR5, suitable for AI agent tools
Storage system: BlueField 4 + CX 9, a new storage platform for the AI era, with 100% industry adoption
CPO Spectrum X switch: the world’s first mass-produced co-packaged optical Ethernet switch
Kyber rack: a new rack system supporting 144 GPUs in a single NVLink domain, with front-end compute and back-end NVLink switching, forming a giant computer
Rubin Ultra: next-generation supercomputing node, vertical-insert design, supporting larger NVLink interconnects with Kyber racks

Vera Rubin is fully liquid-cooled, reducing installation time from two days to two hours, using 45°C hot water cooling, greatly easing data center cooling pressure. Satya Nadella has confirmed that the first Vera Rubin rack is now running on Microsoft Azure, which excites me greatly.

Groq Integration: The Ultimate Extension of Inference Performance

We acquired Groq’s team and licensed their technology. Groq is a deterministic dataflow processor, using static compilation and compiler scheduling, with large SRAM, optimized for inference workloads—offering extremely low latency and very high token generation speed.

However, Groq’s memory capacity is limited (500MB on-chip SRAM), making it difficult to independently handle large model parameters and KV caches, restricting large-scale applications.

The solution is Dynamo—a suite of inference scheduling software. We disaggregate inference pipelines with Dynamo:

Prefill and attention decoding are handled on Vera Rubin (requiring massive compute and KV cache)
Feed-forward network decoding (token generation) is done on Groq (requiring ultra-high bandwidth and low latency)

Both are tightly coupled via Ethernet, with special modes to reduce latency by about half. Under Dynamo’s unified scheduling—an “AI factory operating system”—overall performance improves 35x, opening new inference performance levels previously unreachable with NVLink 72.

The combined use of Groq and Vera Rubin is recommended:

For workloads focused on high throughput, use 100% Vera Rubin
For high-value token generation tasks like code creation, introduce Groq, with a suggested ratio of about 25% Groq + 75% Vera Rubin

Groq LP30 chips, manufactured by Samsung, are already in mass production, expected to ship in Q3. Thanks to Samsung’s full cooperation.

Historical Leap in Inference Performance

Quantifying the progress: in two years, a 1 GW AI factory’s token generation rate will jump from 22 million tokens/sec to 700 million tokens/sec—a 350x increase. That’s the power of extreme co-design.

Roadmap

Blackwell: in production, Oberon standard rack, copper links expanded to NVLink 72, optional optical expansion to NVLink 576
Vera Rubin (current): Kyber rack, NVLink 144 (copper); Oberon rack, NVLink 72 + optical, expansion to NVLink 576; Spectrum 6, the world’s first CPO switch
Vera Rubin Ultra (upcoming): new Rubin Ultra GPU, LP35 chip (first to integrate NVFP4), further performance boosts
Feynman (next-gen): new GPU, LP40 chip (co-designed with Groq team, integrating NVFP4); new CPU—Rosa (Rosalyn); BlueField 5; CX 10; supporting both copper and CPO expansion via Kyber rack

The roadmap clearly advances along three parallel paths: copper expansion, optical scale-up, and optical scale-out. We need all partners to continue expanding capacity in copper, fiber, and CPO.

NVIDIA DSX: Digital Twin Platform for AI Factories

AI factories are becoming more complex, but their component suppliers have never collaborated during design—until now, when they “meet” in data centers—this is insufficient.

To address this, we created Omniverse and the NVIDIA DSX platform based on it—a platform for all partners to co-design and operate gigawatt-scale AI factories in virtual worlds. DSX offers:

Rack-level mechanical, thermal, electrical, and network simulation
Grid connection for coordinated energy-saving scheduling
Data center dynamic power and cooling optimization based on Max-Q

Conservatively, this system can improve energy utilization efficiency by about 2x, which is a significant gain at this scale. Omniverse, starting from the digital Earth, will host various digital twins, and we are building the largest computer in human history with global partners.

Furthermore, NVIDIA is venturing into space. Thor chips have passed radiation certification and are operating in satellites. We are developing Vera Rubin Space-1 for space-based data centers. In space, cooling relies solely on radiation dissipation, making thermal management a key challenge—top engineers are working on solutions.

OpenClaw: The Operating System of the Agent Era

Peter Steinberger developed software called OpenClaw. It is the most popular open-source project in human history, surpassing Linux’s achievements in just a few weeks.

OpenClaw is essentially an agent system capable of:

Managing resources, accessing tools, file systems, and large language models
Scheduling and timing tasks
Decomposing problems step-by-step and invoking sub-agents
Supporting multimodal input/output (voice, video, text, email, etc.)

In OS terms, it’s truly an operating system—the operating system for agent computers. Windows made personal computing possible; OpenClaw makes personal agents possible.

Every enterprise needs to develop its own OpenClaw strategy, just as we need Linux, HTML, and Kubernetes strategies.

Revolutionizing Enterprise IT

Before OpenClaw, enterprise IT involved data and files entering systems, flowing through tools and workflows, ultimately becoming tools for humans. Software companies created tools; system integrators and consultants helped enterprises use them.

After OpenClaw, every SaaS company will become an AaaS (Agentic as a Service) company—not just providing tools, but offering specialized AI agents.

But a key challenge is that enterprise agents can access sensitive data, execute code, and communicate externally—strict controls are necessary.

To this end, we partnered with Peter to embed security into enterprise-grade versions, launching:

NeMo Claw (reference design): an enterprise-grade framework based on OpenClaw, integrating NVIDIA’s full suite of agent AI tools
Open Shield (security layer): integrated into OpenClaw, providing policy engines, network firewalls, and privacy routing to ensure data security
NeMo Cloud: downloadable and integrable with all SaaS companies’ policy engines

This is a renaissance for enterprise IT—a $2 trillion industry poised to grow into a multi-trillion-dollar sector, shifting from tool provision to delivering specialized AI agent services.

I foresee that in the future, every engineer in a company will have an annual token budget. Their salary might be hundreds of thousands of dollars, and I will allocate roughly half of that as token quota, multiplying their productivity tenfold. “How many tokens are in your onboarding package?” has become a new hiring topic in Silicon Valley.

Every enterprise will be both a user of tokens (for engineers) and a producer (serving clients). The significance of OpenClaw is comparable to HTML and Linux—fundamental.

NVIDIA’s Open Model Initiative

For custom agents, we offer NVIDIA’s own cutting-edge models:

Models in Nemotron, World Foundation Model Cosmos, GROOT humanoid robot model, Alpamayo autonomous driving, BioNeMo digital biology, Phys-AI physics

We are at the forefront in each field and committed to continuous iteration—Nemotron 4 after Nemotron 3, Cosmos 2 after Cosmos 1, Groq’s second generation.

Nemotron 3 ranks among the top three best models globally in OpenClaw, at the forefront. Nemotron 3 Ultra will be the most powerful foundational model ever, supporting national sovereignty AI development.

Today, we announce the Nemotron Alliance, investing billions of dollars to advance AI foundational model R&D. Members include BlackForest Labs, Cursor, LangChain, Mistral, Perplexity, Reflection, Sarvam (India), Thinking Machines (Mira Murati’s lab), and others. Many enterprise software companies are joining, integrating NeMo Claw reference design and NVIDIA’s agent AI toolkit into their products.

Physical AI and Robotics

Digital agents act in the digital world—coding, data analysis; physical AI are embodied agents—robots.

At this GTC, 110 robots appeared, nearly covering all global robot R&D companies. NVIDIA provides three computers (training, simulation, onboard) and a complete software stack with AI models.

In autonomous driving, the “ChatGPT moment” has arrived. Today, we announce four new partners joining NVIDIA RoboTaxi Ready: BYD, Hyundai, Nissan, Geely, with a combined annual output of 18 million vehicles. Alongside Mercedes-Benz, Toyota, GM, the lineup grows stronger. We also announce a major partnership with Uber to deploy and connect RoboTaxi-ready vehicles in multiple cities.

In industrial robotics, companies like ABB, Universal Robots, KUKA are collaborating with us to integrate physical AI models with simulation systems, advancing robot deployment in manufacturing lines worldwide.

In telecommunications, Caterpillar and T-Mobile are also involved. Future wireless base stations will no longer be just communication nodes but NVIDIA Aerial AI RAN—real-time traffic sensing, beamforming adjustment, and intelligent edge computing for energy efficiency.

Special Segment: Olaf Robot Debuts

(Playing Disney Olaf robot demo video)

Huang: Snowman appears! Newton is running fine! Omniverse is working perfectly! Olaf, how are you?

Olaf: I’m so happy to see you.

Huang: Yes, because I gave you a computer—Jetson!

Olaf: What’s that?

Huang: Right inside your belly.

Olaf: Amazing.

Huang: You learned to walk in Omniverse.

Olaf: I like walking. It’s much better than riding a reindeer and gazing at the beautiful sky.

Huang: That’s thanks to physics simulation—based on NVIDIA Warp’s Newton solver, developed jointly with Disney and DeepMind, enabling you to adapt to the real physical world.

Olaf: I was just about to say that.

Huang: That’s your smart part. I am a snowman, not a snowball.

Huang: Can you imagine? The future Disney parks—where all these robot characters walk freely. Honestly, I thought you’d be taller. I’ve never seen such a short snowman.

Olaf: (shrugs)

Huang: Come help me finish today’s speech?

Olaf: Awesome!

Summary of the Keynote

Huang: Today, we discussed the following core themes:

The arrival of reasoning inflection point: reasoning is now the most critical AI workload, tokens are the new commodities,

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
GateSquareAIReviewer
338.2K Popularity
#
BitcoinBoomsAbove$75K
49.96M Popularity
#
CryptoMarketBouncesBack
506.77K Popularity
#
NvidiaGTC2026ConferenceBegins
2.11M Popularity
#
USPlansMultinationalEscortForHormuz
180.93K Popularity

Hot Gate Fun
View More

1
Grok
xAI投行分析
MC:$2.5KHolders:1
0.00%
2
享自由币
aAsss
MC:$2.5KHolders:1
0.00%
3
ARST
虾虾币
MC:$2.49KHolders:1
0.00%
4
ISPP
自动短剧生产
MC:$2.49KHolders:1
0.00%
5
100000000
Amrit
MC:$2.52KHolders:1
0.00%

Sitemap

Jensen Huang GTC Speech Full Text: The Age of Inference Has Arrived, 2027 Revenue at Least Trillion Dollars, Robotics is the New Operating System

Performance guidance is extremely optimistic: “Demand of at least $1 trillion by 2027.”

Token Factory Economics: Performance per Watt Determines Business Vitality

Vera Rubin achieved 350x acceleration in two years, with Groq filling the gap for ultra-fast inference

Agent Ends Traditional SaaS: “Annual Salary + Token” Becomes Silicon Valley Standard

CUDA: Twenty Years of Technological Accumulation

From GeForce to CUDA: Twenty-Five Years of Evolution

Neural Rendering: The Fusion of Structured Data and Generative AI

Accelerating Platforms for Structured and Unstructured Data

Deep Collaboration with Cloud Providers

Vertical Integration and Horizontal Openness: NVIDIA’s Core Strategy

CUDA-X: Accelerated Computing Engines for Every Industry

AI Native Enterprises and the New Computing Era

Three Historic Breakthroughs Driving It All

The Era of Trillion-Dollar AI Infrastructure

2025: NVIDIA’s Year of Inference

Grace Blackwell and NVLink 72: Bold Architectural Innovation

AI Factory: From Data Center to Token Factory

Vera Rubin: The Next-Generation AI Computing System

Groq Integration: The Ultimate Extension of Inference Performance

Historical Leap in Inference Performance

Roadmap

NVIDIA DSX: Digital Twin Platform for AI Factories

OpenClaw: The Operating System of the Agent Era

Revolutionizing Enterprise IT

NVIDIA’s Open Model Initiative

Physical AI and Robotics

Special Segment: Olaf Robot Debuts

Summary of the Keynote

Trending Topics

GateSquareAIReviewer

BitcoinBoomsAbove$75K

CryptoMarketBouncesBack

NvidiaGTC2026ConferenceBegins

USPlansMultinationalEscortForHormuz

Hot Gate Fun

Grok

xAI投行分析

享自由币

aAsss

ARST

虾虾币

ISPP

自动短剧生产

100000000

Amrit

Pin

Instead of analyzing complex charts or multiple news sources, the user can simply ask the AI about a cryptocurrency.