Domestic models dominate OpenRouter: Surge in token usage reveals programming and intelligent agents as the key to victory

2026-02-24 02:32:20

During the Spring Festival, the usage of domestic large models was extremely active. According to the latest weekly data from OpenRouter, the total token count of the top ten models on the platform is approximately 87 trillion, with Chinese models dominating at 53 trillion, accounting for 61%.

The top three models in token calls for that week were all domestic large models, namely Minimax M2.5, Kimi K2.5, and GLM-5, with week-over-week changes of +197%, -20%, and +158%, respectively. Among them, MiniMax M2.5 surged to the top with 2.45 trillion tokens, followed by Kimi K2.5 with 1.21 trillion. Zhipu GLM 5 and DeepSeek V3.2 ranked third and fifth.

OpenRouter is the world’s largest large model API aggregation platform, providing developers with a unified API interface to access hundreds of large language models worldwide. Its core features include multi-model invocation, intelligent routing optimization, and transparent performance rankings, aiming to solve the complexity of multi-model integration and vendor lock-in issues.

Data from the platform shows that programming and agent capabilities are becoming the two main competitive focus areas for large models.

Recently, overall invocation volume on OpenRouter has surged significantly. The official confirmed that M2.5 has driven increased demand for long text segments ranging from 100K to 1M tokens, which is a typical consumption scenario for agent workflows.

In terms of token calls, the top three domestic large models on this platform all focus on enhancing programming abilities and automating agent tasks, representing a significant application-level breakthrough for domestic large models in early 2026.

Xiyu Technology (MiniMax) released MiniMax M2.5 on February 13, claiming it to be the world’s first production-level flagship model designed natively for agent scenarios. Within seven days of release, its invocation volume exceeded 3.07 trillion tokens. Thanks to its excellent performance in programming and agent workflows and extremely low costs, it has become developers’ first choice.

Moonshot AI released KimiK2.5 on January 27. This model adopts a native multimodal architecture, capable of scheduling up to 100 “agent clones” to work in parallel, increasing efficiency of complex tasks by 3 to 10 times. It ranks first in multiple subcategories such as programming and tool invocation, with invocation volume far surpassing Gemini 3 and Claude models. According to The Paper, less than a month after its release, Kimi’s revenue over 20 days already exceeded its total revenue for all of 2025. The growth was mainly driven by a surge in global paid users and API calls, with overseas paid user numbers increasing rapidly.

Zhipu released GLM-5 on February 12. The model’s parameter scale was further expanded, employing sparse attention mechanisms, and was specially designed for complex system engineering and long-range agent tasks. With advantages such as free access and a 200K context window, user growth accelerated after release. Zhipu implemented measures such as limiting sales and price increases for the Coding Plan, and on New Year’s Eve announced a nationwide search for “computing power partners.”

As AI model application scenarios deepen, users are shifting from simple Q&A to complex workflows, such as code refactoring, file rewriting, document generation, and the proliferation of agent modes, leading to a clear trend of “inflation” in token consumption.

While performance continues to improve, domestic models still stand out for their cost-effectiveness. For example, compared to Claude Opus 4.6, MiniMax M2.5 and Zhipu GLM-5 have significant cost advantages: in input costs, both MiniMax M2.5 and GLM-5 are priced at $0.3 per million tokens, while Claude Opus 4.6 costs as much as $5 per million tokens, about 16.7 times higher; in output costs, MiniMax M2.5 is $1.1 per million tokens, GLM-5 is $2.55, and Claude Opus 4.6 reaches $25, making it roughly 22.7 times and 9.8 times more expensive than MiniMax M2.5 and GLM-5, respectively.

These domestic models do not fully reflect the token invocation volume of all Chinese model vendors. Haitong International Securities data shows that the daily token calls for Volcano Engine’s large models have grown from 20 trillion at the end of 2024 to 63 trillion by the end of 2025; Alibaba Cloud’s external clients’ daily token calls approached 5 trillion in 2025, with a target of at least 15-20 trillion in 2026. Internal business daily calls are planned to increase from 16-17 trillion to 100 trillion. Industry-wide, China’s total daily token consumption was about 100 billion at the start of 2024, surpassing 30 trillion by mid-2025, and by February 2026, the combined daily token consumption of mainstream large models had reached approximately 180 trillion.

Dongan Securities’ latest research report states that as domestic models improve their programming and agent capabilities, their invocation volume has increased significantly. Domestic large models in programming and agent fields are expected to further accelerate application deployment and increase token consumption.

Changjiang Securities previously indicated that as programming and multimodal models mature, downstream application scenarios are expected to be truly unlocked, bringing a large demand for high-quality tokens. Referring to overseas AI industry development patterns, there is about a two-year lag from capital expenditure investment to token demand explosion. Domestic major AI companies’ capital expenditure cycles lag about a year behind overseas, starting in the second half of 2024. Therefore, revenue for domestic cloud providers has begun to grow, and the true explosion of token volume is expected to occur in 2026.

(Source: Cailian Press)

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.