Open-source models are catching up, but what exactly are they catching up to?

Question

## Open source is catching up, but we need to be clear about what it has caught up toZ.ai releases GLM-5.1, and Modal is almost simultaneously rolled out as a hosted service. Two things layered together are more interesting than looking at either one alone.The model is a 754B MoE (40B active parameters). SWE-Bench Pro score is 58.4%; on coding tasks it’s about on par with GPT-5.4 and Opus 4.6. It can run a full 8 hours in autonomous mode, and it doesn’t crash after thousands of iterations. BenchLM is currently ranked #10; KernelBench shows it’s 3.6x faster than prior open-source approaches.Reactions on social media are split: Bindu Reddy says this is evidence that open source has caught up to closed source; Victor Taelin doubts that “500+ tokens/s” is realistic at FP8 precision, and that a real deployment might be closer to around 200 tps. Both sides have a point—this model can perform—but the marketing numbers are a bit optimistic.This open-source release differs from previous ones in a few ways:- **Modal’s free endpoints change the algorithm for usability and cost.** Z.ai (formerly Zhipu; now publicly listed in Hong Kong) reaches Western developers via Modal, so developers don’t have to worry about geopolitical frictions; the $1 per million input token pricing also lowers the price anchor for proprietary services.- **Context matters for the messaging around inference efficiency.** GLM-5.1 uses sparse mixture attention and asynchronous reinforcement learning to control expansion costs. But the “500+ tps” depends on infrastructure that most people don’t have. The real bottleneck is in service-ization and scheduling, not in the model’s paper specifications.- **It can plug directly into existing toolchains.** Compatibility with Claude Code and OpenClaw means it can be swapped straight into existing proprietary workflows. The pressure this brings to Anthropic and OpenAI is mainly on pricing, not on having capabilities flattened.MarkTechPost and Constellation both interpret this as open source and closed source’s “6-month gap” converging. In the direction of coding agents, that assessment is very likely true. Z.ai uses an MIT license, and second-stage fine-tuning is already on the way.But don’t take this to mean open source has fully turned the tables. Proprietary models still lead by a lot in safety alignment and multimodal reasoning. What’s being eroded is the moat in the coding-agent scenario: enterprises value deployment cost more for these kinds of tasks, and they’re less sensitive to that marginal difference in capability.## What matters more than the model is the infrastructureModal is built on a B200 cluster. It deploys GLM-5.1 with SGLang; in interactive scenarios it can run at 30–75 tokens/s. These seemingly boring engineering details are what truly matter.Z.ai demonstrates throughput of 21.5k QPS on VectorDBBench (after 600 iterations of optimization). This kind of performance requires Modal’s serverless elasticity for stable delivery; the model alone can’t reach that scale.It also changes how we think about “model releases”: they’re no longer isolated events, but part of an ecosystem strategy. The combination of “open-source models + Western infrastructure” becomes a hedge against being locked into a single lab’s API.As for the boundaries of GLM-5.1: coding benchmark scores reach 94.6% of Opus, but there’s still a gap in reasoning. A more “balanced” capability profile is more meaningful for specific use cases.Looking ahead: Z.ai’s revenue grew 131% year over year last year. If inference costs fall below $0.50 per million tokens, open source could capture 30–50% of coding-agent deployment share within a year. Changes in U.S. policy may cause some disruption, but the current risk looks low.| Viewpoint | Evidence | Industry impact | My take ||---|---|---|---|| Open-source optimists | SWE-Bench Pro 58.4%, 8-hour autonomous runs | Enterprises begin piloting open-source replacements | A bit overstated. **The advantage is in integration and usability, not in scores. Modal’s free trials matter more than leaderboard rankings.** || Proprietary defenders | BenchLM #10; reasoning capability still below Opus | Closed source continues to lead in safety and multimodality | **Pricing mismatch.** GLM’s efficiency compresses the pricing power of the competition, and Anthropic has to respond. || Infrastructure pragmatists | Modal endpoints, OpenClaw compatibility | Capital concentrates toward serverless platforms | **This is the key.** Whichever model wins, infrastructure companies benefit. || Geopolitical skeptics | Z.ai publicly listed in Hong Kong, MIT license, tensions between China and the U.S. | The source of the models will face more scrutiny | Overestimated for now. It’s more practical to focus on **monetization opportunities with Western hosting partners**. |**Conclusion:** This one-two punch confirms one thing: in the vertical of coding agents, open-source capability has basically caught up. The beneficiaries are the Builders who first built an “infrastructure-agnostic” architecture, and the investors who set up hosting platforms. Anthropic faces pricing pressure. Enterprises that remain deeply bound to closed-source APIs are paying a premium for capabilities that are shrinking with every passing day.**Importance:** High  **Category:** Model releases, partnerships, open source**Judgment:** For the coding-agent track, this is still a relatively early window. The first beneficiaries are two types of people: (1) Builders and integrators building infrastructure-agnostic workflows; (2) capital backing serverless hosting and inference platforms. For short-term traders, unless they can catch the cadence of price cuts and traffic migration, the edge is limited. For long-term holders, they need to watch whether the cost curve truly drops below $0.50 per million tokens, to validate whether market share can jump.

Open-source models are catching up, but what exactly are they catching up to?

Open source is catching up, but we need to be clear about what it has caught up to

What matters more than the model is the infrastructure

Trending Topics

GateLaunchesPreIPOS

CryptoMarketsDipSlightly

OilEdgesHigher

USIranCeasefireTalksFaceSetbacks

MorganStanleyLaunchesSpotBitcoinETF

Hot Gate Fun

LOP

LOP

RMWS

人民万岁

周美丽

周美丽

Gate Fun

周美丽

horse

horse

Pin

Viewpoint	Evidence	Industry impact	My take
Open-source optimists	SWE-Bench Pro 58.4%, 8-hour autonomous runs	Enterprises begin piloting open-source replacements	A bit overstated. The advantage is in integration and usability, not in scores. Modal’s free trials matter more than leaderboard rankings.
Proprietary defenders	BenchLM #10; reasoning capability still below Opus	Closed source continues to lead in safety and multimodality	Pricing mismatch. GLM’s efficiency compresses the pricing power of the competition, and Anthropic has to respond.
Infrastructure pragmatists	Modal endpoints, OpenClaw compatibility	Capital concentrates toward serverless platforms	This is the key. Whichever model wins, infrastructure companies benefit.
Geopolitical skeptics	Z.ai publicly listed in Hong Kong, MIT license, tensions between China and the U.S.	The source of the models will face more scrutiny	Overestimated for now. It’s more practical to focus on monetization opportunities with Western hosting partners.