Medical accounts for 1%, legal accounts for 0.9%, and education accounts for 1.8%. These are not saturated markets but are almost nonexistent markets.
Author: Garry’s List
Translation: Deep潮 TechFlow
Deep潮 Guide: Anthropic recently released the most comprehensive real-world AI Agent usage study to date. The key data shows: software engineering accounts for nearly 50% of Agent tool calls, while medical, legal, education, and 13 other vertical fields combined make up less than half of the remaining share, with each field below 5%.
This is not a sign of market saturation but a map of 300 vertical AI unicorns — even more valuable is an counterintuitive finding cited in the article: models can operate independently for nearly 5 hours, but users actually only let them work for 42 minutes. This “trust deficit” itself presents the next product opportunity.
Full Text:
Software engineering accounts for nearly 50% of all AI Agent tool calls. The other 16 vertical fields—medical, legal, financial, etc.—are barely touched, each below 5%. This means there are 300 vertical AI unicorns waiting to be built.
If I were starting a business today, I would focus on the red area of that bar chart until I see my future.
Box founder Aaron Levie said:
This chart reminds us just how big the opportunity is in the AI Agent space.
Of course, there will be many opportunities horizontally, but there are also many workflows requiring deep domain expertise to truly help users automate their unique processes within their verticals.
The template is: build agent software that connects to proprietary data, effectively coordinating workflows between users and agents, while possessing deep domain-specific contextual engineering capabilities and the ability to drive change management on the client side.
Many fields still have huge gaps.
Software engineering dominates nearly half of all AI Agent activity. The other half is scattered across 16 verticals, none exceeding 9%. Medical accounts for 1%, legal 0.9%, education 1.8%. These are not saturated markets but nearly nonexistent ones.
Anthropic has just released the most comprehensive real-world AI Agent usage study to date. The core finding: software engineering accounts for 49.7% of API-based Agent tool calls. The underlying conclusion is: everything else is blue ocean.
Deployment Lag
There’s a data point that should excite entrepreneurs: model capabilities have far surpassed the boundaries of what users are willing to trust.
METR’s capability assessment shows Claude can handle tasks that would take humans nearly five hours. But in actual use, the 99.9th percentile session length is only about 42 minutes. This gap—the difference between what AI can do and what we allow it to do—is a huge opportunity.
Image: The longest training duration for Claude Code nearly doubled within three months. This not only improved capabilities but also increased trust.
Source: x.com
From October 2025 to January 2026, the 99.9th percentile single-session duration nearly doubled, from less than 25 minutes to over 45 minutes. The growth was steady across different model versions. It’s not just that models are getting stronger; users are learning through repeated use, gradually extending their trust in the agent.
“From August to December, Claude Code’s success rate on the most challenging internal tasks doubled, while manual interventions per session decreased from 5.4 to 3.3.”
Capabilities are already there; deployment has yet to catch up. This is not a problem but a product opportunity.
How Trust Evolves
20% of new users automatically approve Claude Code’s operations. After 750 sessions, over 40% of sessions run entirely on auto-approval. But an counterintuitive discovery: experienced users tend to intervene more, not less. New users intervene in 5% of rounds, while veteran users do so in 9%.
Image: Trust is a skill that accumulates over time. New users automatically approve 20% of sessions. After 750 sessions, this exceeds 40%.
Source: Anthropic
This isn’t a contradiction but a shift in supervision strategy. Beginners tend to approve step-by-step before operation, while experienced users authorize first and intervene only when issues arise—they’ve moved from pre-approval to active monitoring.
A noteworthy safety-related finding: in complex tasks, Claude Code requests clarifications more than twice as often as humans actively intervene. Agents pause to confirm rather than push through blindly. This is a feature, not a flaw.
“The core insight from this study is that the autonomy exercised by agents in practice is co-constructed by the model, the user, and the product. When uncertain, Claude pauses to ask questions, limiting its independence. Users build trust through collaboration and adjust their supervision strategies accordingly.”
Levie’s Vertical AI Approach
Aaron Levie pointed out the enormous wealth and value waiting to be unlocked: build agent software that connects to proprietary data, truly solving real problems for real people, filling in context to maximize intelligent output, and — a part most entrepreneurs overlook — driving change management on the customer side.
This last point is precisely why vertical AI is so hard to replicate. Anyone can build an API wrapper, but few can truly master the workflows, regulatory constraints, and organizational resistance specific to healthcare billing, legal discovery, or building permit approval.
SaaS has grown tenfold every decade over the past few decades. Over 40% of venture capital in the past 20 years has flowed into SaaS companies. This industry has birthed over 170 SaaS unicorns. The logic is simple: each of these unicorns has a vertical AI version waiting to emerge. And the AI version could be ten times bigger because it replaces not just software but also operational personnel.
The Essence of Co-Construction
Anthropic’s core findings warrant serious attention from anyone involved in AI policy. Autonomy is not an inherent property of models but is co-constructed by models, users, and products. Pre-deployment assessments cannot capture this; it must be measured in real-world use.
Anthropic states:
Software engineering accounts for about 50% of our API-based Agent tool calls, but we see emerging activity in other industries. As the boundaries of risk and autonomy continue to expand, post-deployment monitoring becomes critical. We encourage other model developers to extend this research.
The safety data is reassuring: 73% of tool calls involve human in the loop, and only 0.8% of operations are irreversible. The highest-risk deployment scenarios—such as API key leaks or autonomous trading—are mostly safety assessments rather than real production.
“Regulatory requirements that specify exact interaction modes—for example, requiring human approval for every operation—only create friction and may not enhance safety.”
Mandating “approval for every operation” kills productivity gains without necessarily increasing safety. A better goal is to ensure humans can monitor and intervene, rather than prescribe specific approval workflows.
Where Unicorns Are Hidden
The map is already drawn. Software engineering is being done. Medical, legal, financial, education, customer service, logistics—16 verticals, each with single-digit market share—are waiting for domain expertise to be truly embedded into agents.
Over 300 SaaS unicorns have been born; the next 300 vertical AI unicorns are about to emerge. Founders who choose their vertical, embed domain expertise into their agents, and figure out how to drive change management will own the enterprise software market of the next decade.
Models can already work for five hours, but users only let them work for 42 minutes. That’s the signal: we are still in the very early stages, with vast amounts of building still to do, in countless areas where even a minute of intelligent application has yet to be seen.
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
Anthropic Data: Nearly half of AI Agent calls are concentrated in software engineering, and these 16 verticals remain a blue ocean
Medical accounts for 1%, legal accounts for 0.9%, and education accounts for 1.8%. These are not saturated markets but are almost nonexistent markets.
Author: Garry’s List
Translation: Deep潮 TechFlow
Deep潮 Guide: Anthropic recently released the most comprehensive real-world AI Agent usage study to date. The key data shows: software engineering accounts for nearly 50% of Agent tool calls, while medical, legal, education, and 13 other vertical fields combined make up less than half of the remaining share, with each field below 5%.
This is not a sign of market saturation but a map of 300 vertical AI unicorns — even more valuable is an counterintuitive finding cited in the article: models can operate independently for nearly 5 hours, but users actually only let them work for 42 minutes. This “trust deficit” itself presents the next product opportunity.
Full Text:
Software engineering accounts for nearly 50% of all AI Agent tool calls. The other 16 vertical fields—medical, legal, financial, etc.—are barely touched, each below 5%. This means there are 300 vertical AI unicorns waiting to be built.
If I were starting a business today, I would focus on the red area of that bar chart until I see my future.
Box founder Aaron Levie said:
This chart reminds us just how big the opportunity is in the AI Agent space.
Of course, there will be many opportunities horizontally, but there are also many workflows requiring deep domain expertise to truly help users automate their unique processes within their verticals.
The template is: build agent software that connects to proprietary data, effectively coordinating workflows between users and agents, while possessing deep domain-specific contextual engineering capabilities and the ability to drive change management on the client side.
Many fields still have huge gaps.
Software engineering dominates nearly half of all AI Agent activity. The other half is scattered across 16 verticals, none exceeding 9%. Medical accounts for 1%, legal 0.9%, education 1.8%. These are not saturated markets but nearly nonexistent ones.
Anthropic has just released the most comprehensive real-world AI Agent usage study to date. The core finding: software engineering accounts for 49.7% of API-based Agent tool calls. The underlying conclusion is: everything else is blue ocean.
Deployment Lag
There’s a data point that should excite entrepreneurs: model capabilities have far surpassed the boundaries of what users are willing to trust.
METR’s capability assessment shows Claude can handle tasks that would take humans nearly five hours. But in actual use, the 99.9th percentile session length is only about 42 minutes. This gap—the difference between what AI can do and what we allow it to do—is a huge opportunity.
Image: The longest training duration for Claude Code nearly doubled within three months. This not only improved capabilities but also increased trust.
Source: x.com
From October 2025 to January 2026, the 99.9th percentile single-session duration nearly doubled, from less than 25 minutes to over 45 minutes. The growth was steady across different model versions. It’s not just that models are getting stronger; users are learning through repeated use, gradually extending their trust in the agent.
“From August to December, Claude Code’s success rate on the most challenging internal tasks doubled, while manual interventions per session decreased from 5.4 to 3.3.”
Capabilities are already there; deployment has yet to catch up. This is not a problem but a product opportunity.
How Trust Evolves
20% of new users automatically approve Claude Code’s operations. After 750 sessions, over 40% of sessions run entirely on auto-approval. But an counterintuitive discovery: experienced users tend to intervene more, not less. New users intervene in 5% of rounds, while veteran users do so in 9%.
Image: Trust is a skill that accumulates over time. New users automatically approve 20% of sessions. After 750 sessions, this exceeds 40%.
Source: Anthropic
This isn’t a contradiction but a shift in supervision strategy. Beginners tend to approve step-by-step before operation, while experienced users authorize first and intervene only when issues arise—they’ve moved from pre-approval to active monitoring.
A noteworthy safety-related finding: in complex tasks, Claude Code requests clarifications more than twice as often as humans actively intervene. Agents pause to confirm rather than push through blindly. This is a feature, not a flaw.
“The core insight from this study is that the autonomy exercised by agents in practice is co-constructed by the model, the user, and the product. When uncertain, Claude pauses to ask questions, limiting its independence. Users build trust through collaboration and adjust their supervision strategies accordingly.”
Levie’s Vertical AI Approach
Aaron Levie pointed out the enormous wealth and value waiting to be unlocked: build agent software that connects to proprietary data, truly solving real problems for real people, filling in context to maximize intelligent output, and — a part most entrepreneurs overlook — driving change management on the customer side.
This last point is precisely why vertical AI is so hard to replicate. Anyone can build an API wrapper, but few can truly master the workflows, regulatory constraints, and organizational resistance specific to healthcare billing, legal discovery, or building permit approval.
SaaS has grown tenfold every decade over the past few decades. Over 40% of venture capital in the past 20 years has flowed into SaaS companies. This industry has birthed over 170 SaaS unicorns. The logic is simple: each of these unicorns has a vertical AI version waiting to emerge. And the AI version could be ten times bigger because it replaces not just software but also operational personnel.
The Essence of Co-Construction
Anthropic’s core findings warrant serious attention from anyone involved in AI policy. Autonomy is not an inherent property of models but is co-constructed by models, users, and products. Pre-deployment assessments cannot capture this; it must be measured in real-world use.
Anthropic states:
Software engineering accounts for about 50% of our API-based Agent tool calls, but we see emerging activity in other industries. As the boundaries of risk and autonomy continue to expand, post-deployment monitoring becomes critical. We encourage other model developers to extend this research.
The safety data is reassuring: 73% of tool calls involve human in the loop, and only 0.8% of operations are irreversible. The highest-risk deployment scenarios—such as API key leaks or autonomous trading—are mostly safety assessments rather than real production.
“Regulatory requirements that specify exact interaction modes—for example, requiring human approval for every operation—only create friction and may not enhance safety.”
Mandating “approval for every operation” kills productivity gains without necessarily increasing safety. A better goal is to ensure humans can monitor and intervene, rather than prescribe specific approval workflows.
Where Unicorns Are Hidden
The map is already drawn. Software engineering is being done. Medical, legal, financial, education, customer service, logistics—16 verticals, each with single-digit market share—are waiting for domain expertise to be truly embedded into agents.
Over 300 SaaS unicorns have been born; the next 300 vertical AI unicorns are about to emerge. Founders who choose their vertical, embed domain expertise into their agents, and figure out how to drive change management will own the enterprise software market of the next decade.
Models can already work for five hours, but users only let them work for 42 minutes. That’s the signal: we are still in the very early stages, with vast amounts of building still to do, in countless areas where even a minute of intelligent application has yet to be seen.