Building the First Domestic AI Inference Thousand-Card Cluster, Cloudminds Sets a Model for "Domestic Models and Domestic Chips"

MaticHoleFiller · 2026-03-17T17:32:36+00:00

Cloudwalk Technology won the bid for Zhanjiang City's AI penetration support infrastructure construction project, with a total amount of 420 million yuan, focusing on inference capability development. The project will be equipped with self-developed AI inference acceleration cards to help drive the company's performance growth and validate proprietary technology. In the future, Cloudwalk Technology plans to continuously reduce inference costs to enable larger-scale AI computing applications.

MaticHoleFiller

2026-03-17 17:32:36

Abstract generation in progress

YunTian LiFe (688343.SH) Secures an 420 Million Yuan Large Order

On March 12, according to public bidding information, YunTian LiFe won the bid for the Zhanjiang City AI Penetration Support New Production Capacity Infrastructure Construction Project (hereinafter referred to as “Zhanjiang Project”), with a bid amount of 420 million yuan. The project will be based on YunTian LiFe’s self-developed domestic AI inference acceleration cards to build a domestic AI inference cluster with thousands of cards. The cluster plans to incorporate domestic large models like DeepSeek to provide more convenient and cost-effective AI capabilities for government, industry, and related application scenarios.

YunTian LiFe’s 2025 performance forecast shows that in the past year, the company achieved revenue of 1.308 billion yuan, a 42.57% increase compared to the same period last year; net loss attributable to shareholders was 402 million yuan, narrowing by 30.50% year-over-year. Securing this 420 million yuan computing power infrastructure project at this stage not only strongly endorses its commercial capabilities of self-developed chips but also directly translates into ongoing growth momentum for the company’s future performance.

However, for YunTian LiFe, beyond financial contribution, the benchmark significance of the Zhanjiang Project is even more noteworthy. As artificial intelligence moves from the laboratory into the deep waters of industry, the value of computing power is no longer just a ranking metric but a means to empower thousands of industries with inclusive productivity. From this perspective, the Zhanjiang Project is not only a short-term revenue booster but also a key move for YunTian LiFe to demonstrate its self-developed strength and seize industry high ground.

As the first fully domestically produced AI inference cluster with thousands of cards, the Zhanjiang Project not only sets a benchmark for the commercialization of large-scale inference computing power but also provides the best testing ground for the deep integration of “domestic models and chips,” helping to steadily advance toward the goal of a 10,000-card cluster.

When industry focus shifts from training peaks to inference costs, the entity that can provide stable large-scale inference capabilities at lower costs will gain an advantage in the next AI race. The move YunTian LiFe has made in Zhanjiang is a critical strategic positioning for the inference era.

AI Computing Power Moving Toward “Inference Priority”

Unlike the previous common “training and inference integrated” construction model adopted by domestic intelligent computing centers, the project in Zhanjiang chooses a more focused technical path—dedicated AI inference clusters mainly aimed at various industry application scenarios, providing direct support for AI transformation of traditional industries.

This shift reflects a profound change in AI industry logic.

AI computing power systems can generally be divided into training and inference. Training computing power determines how models develop capabilities from zero to one, emphasizing absolute computational capacity; inference computing power mainly involves using trained neural network models for prediction, emphasizing practicality, with lower requirements for computing power, and more focus on low latency and low power consumption.

In recent years, industry excitement has largely centered on parameter scale arms races, with billion-, hundred-billion-, and even trillion-parameter large models emerging one after another, with major companies competing for model capabilities. However, as model capabilities mature, industry players are increasingly asking a more practical question: with such powerful models, in which scenarios can they truly create value?

Inference computing power has thus gained more attention. Whether it’s the popular SeeDance during the Spring Festival, the widely discussed “Lobster” recently, or AI agent applications across various industries, all rely on inference computing power. According to market research firm Gartner, by 2026, about 55% of AI-specific cloud infrastructure expenditure will be on inference workloads.

This aligns perfectly with YunTian LiFe’s strengths. As a domestic chip manufacturer specializing in inference chips for years, its pioneering “compute block” architecture has achieved flexible scalability of computing power under advanced domestic processes, and has launched series chips like “DeepSea,” “DeepQing,” and “DeepSky” for edge, embodied intelligence, and cloud domains.

Thanks to this, YunTian LiFe can better meet the needs of the Zhanjiang project.

Large model inference applications require high concurrency, high throughput, and low latency simultaneously. Meanwhile, as the context length of large models increases, a large amount of intermediate states need to be stored in KV Cache (key-value cache). Industry consensus suggests that future inference system performance bottlenecks will increasingly stem from data access efficiency rather than just computational capacity.

Against this backdrop, the coordinated design of computing power, storage, and networking is gradually becoming a key competitive factor in AI infrastructure.

The inference cluster built in Zhanjiang is designed around this concept. It uses YunTian LiFe’s independently developed AI inference chips and adopts a “priority optimization for prefill, balanced decode” technical approach in system architecture. By targeted configuration of computational resources and storage bandwidth in chip design, the system can maintain high throughput even in long-context inference scenarios.

In terms of networking, YunTian LiFe employs a unified high-speed interconnect architecture, constructing the cluster’s physical network with 400G optical links to achieve high bandwidth and low latency communication between nodes. The deployment capability supports scaling from tens of cards per node to thousands of cards in a cluster, accommodating different AI application scales.

Through multi-layer optimization of chip architecture, network interconnection, and system scheduling, this inference cluster demonstrates significant advantages in overall efficiency and cost control, providing a more economical computing power solution for large-scale AI applications.

Aiming to Reduce the Cost of 100 Billion Tokens to One Cent

For YunTian LiFe, the Zhanjiang project is just the beginning.

As large models gradually enter application phases, industry focus is shifting from “peak computing power” to “cost efficiency per unit.” In other words, future AI industry competition will not only depend on model capabilities but also on who can provide stable large-scale inference at the lowest cost.

As a pioneer in inference chips, YunTian LiFe has a clear understanding of this. In February this year, it announced a three-year plan to reduce the cost of inference for a million tokens by double digits annually, outlining its future high-performance chip development.

The first-generation super-node P chip will be launched this year, optimized for ultra-long context prefill inference in scenarios involving millions of tokens, matching the performance of H100. Subsequently, in 2027, YunTian LiFe plans to develop the first-generation super-node D chip to achieve ultra-low latency in decode inference. Finally, in 2028, it will develop the second-generation super-node D chip, aiming to improve overall prefill and decode performance through system-level collaborative optimization, moving toward millisecond-level inference latency.

While the blueprint is set, even the most ambitious technical plans require real-world scenarios to validate their commercial value. Will the chip designs truly meet industry needs? Can resource allocation for prefill and decode be optimized under actual loads? How to effectively alleviate data access pressure from KV Cache at the scale of thousands of cards? These questions cannot be answered solely in labs—they must be tested in real industry environments.

Therefore, for YunTian LiFe, the Zhanjiang project is not just a simple delivery but a practical battlefield for its core technology.

It is reported that the Zhanjiang project will be built in three phases, all using YunTian LiFe’s self-developed domestic AI inference acceleration cards. The first phase will deploy YunTian LiFe’s X6000 inference acceleration cards; phases two and three will feature the company’s latest chip products. Among them, YunTian LiFe’s first prefill chip, DeepVerse100, is expected to complete tape-out within the year and will be deployed first in the Zhanjiang cluster.

Meanwhile, the 10,000-card inference cluster in Zhanjiang also demonstrates strong elastic deployment capabilities. Under typical architecture, such a cluster usually involves multi-level expansion: from single nodes with 8, 32, or 64 cards to ultra-large nodes with hundreds of cards, and then to large-scale clusters across nodes. Operating such a system in practice will thoroughly test inter-card connectivity, node communication, and load balancing, accumulating experience for future larger-scale AI computing systems.

In the longer term, YunTian LiFe has proposed the “1001 Plan,” aiming for the long-term goal of “billion tokens for one cent,” continuously reducing large model inference costs through chip and system co-optimization.

If this goal becomes reality, AI will truly become as fundamental as water and electricity—an infrastructure flowing through countless industries. For YunTian LiFe, which is leading in the inference track, this will usher in a golden era akin to the “water seller” in the industry.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.