Futures
Access hundreds of perpetual contracts
TradFi
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Launchpad
Be early to the next big token project
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
Breakthroughs are just around the corner in the next two or three years! Li Dahai, co-founder of Mirror Wall Intelligence: A new generation of human-computer interaction has already shown its first signs of dawn.
As artificial intelligence begins to move from screens into the real world, human-computer interaction is experiencing a moment of upgrade.
Whether it’s smartphones, cars, or the accelerating adoption of robots and wearable devices, the traditional turn-based interaction centered around asking and answering is gradually revealing issues such as slow response times, perceptual disconnection, and interrupted context. These inherent flaws in the interaction method are becoming a key bottleneck for AI entering the physical world.
On February 2nd, Li Dahai, co-founder and CEO of Mianbi Intelligence, stated in an interview with media including the Daily Economic News that a new direction for human-computer interaction has already emerged, but a true leap will not happen overnight. Instead, it will gradually occur alongside the continuous improvement of cloud and edge model capabilities. In this process, whether multimodal models can become embodied brains that connect digital intelligence with the physical world is a core industry concern.
Image source: Mianbi Intelligence
Multimodality is not about feature stacking but a paradigm shift in interaction
As AI begins to enter the physical world and drive robots or wearable devices, traditional human-computer interaction models are showing their limitations.
Tsinghua University Computer Science Professor, Mianbi Intelligence co-founder and Chief Scientist Liu Zhiyuan believes that for humans, listening, speaking, and seeing are multi-channel parallel processes. People can speak while continuing to listen and see, and these processes do not hinder each other. However, at the human-computer interaction level, most previous models have struggled to possess this capability. “Once you start speaking, you can’t see, and there are various issues.”
The flaws in this interaction method limit AI’s deep move toward embodied intelligence. Liu Zhiyuan sees human-like, highly natural interaction ability as a key step to making robots and smart terminals more human. “This (multimodal model) and enabling future robots and smart terminals to interact naturally like humans might be closer.”
According to this judgment, embodied intelligence is not an independent branch but an application scenario that demands higher interaction capabilities from models. Liu Zhiyuan emphasizes that in scenarios like embodied and smart terminals, similar models are also needed to better serve humans. He predicts that rapid iteration of embodied intelligence capabilities may not be far off. “If I had to estimate how long, probably just two or three years.”
At the industry level, the integration of edge models and AI hardware is becoming a realistic yet complex challenge.
Li Dahai believes that with the entry of major companies and intelligent agents into devices like smartphones, a new form of human-computer interaction has already shown signs of emergence, but this does not mean a turning point has arrived. He predicts that this leap will not be completed in one go. “Everyone will keep exploring in this direction, and it will be accompanied by continuous improvements in cloud and edge models.”
Even in the currently widely discussed smartphone scenario, the technology itself still faces significant constraints. Li Dahai states that, for example, the underlying model behind Doubao Phone is one of the best in the industry, but its ability to complete complex human tasks has not yet reached an ideal usable state.
Li Dahai further analyzes that, on one hand, pure cloud solutions face privacy issues; on the other hand, resource consumption on the edge, such as computing power, makes deploying multimodal capabilities on phones take longer. He straightforwardly states that the more modalities involved, the greater the resource consumption, which determines the pace differences across various terminal forms.
Currently, smartphone interactions mainly rely on voice and touch, with limited modalities. Li Dahai explains that, taking Doubao Phone as an example, its core breakthrough is enabling the intelligent agent to operate the phone like a human, completing complex tasks on behalf of the user. This addresses the problem of human-like output. The next significant evolution will focus on input methods.
“Currently, the synchronization of context between the phone and humans depends on active operations on the screen. If future phones can directly listen to and view the real world, they can better synchronize and share context with the user.” Li Dahai believes this is a crucial step toward truly intelligent devices, but it will also face dual challenges of power consumption and privacy protection, raising higher requirements for product design.
In contrast, scenarios like cars and robots, with more relaxed resource conditions, are seen by Li Dahai as more promising directions for deploying multimodal models. In the field of embodied intelligence, he believes the current bottleneck is not the physical hardware but the “brain.” Once model capabilities make a breakthrough, embodied intelligence could experience a leap similar to the “ChatGPT moment.”
Industry will quickly witness explosive growth in model specialization and interaction capabilities
In this context, Mianbi Intelligence’s positioning is not focused on any specific product or hardware form but on whether it can continuously produce high-quality models.
In the AI field, the Scaling Law has long been considered an iron law, but debates about whether it will hit a wall have never ceased. Mianbi Intelligence proposed another perspective: the Densing Law, which states that large models have a very short lifespan, with capability density doubling approximately every 100 days. This means that the key is not just developing an excellent model but having the ability to continuously develop outstanding models.
Mianbi Intelligence sees itself as a “lithography machine for large models.” Li Dahai explains that this lithography machine refers to the ongoing training of larger models with higher capability density.
Liu Zhiyuan adds that the logic of the Densing Law is similar to the chip industry: the future trend of large models is to become smaller in size and higher in density. This will drastically reduce model costs and make it more feasible to run on devices closer to users.
Li Dahai emphasizes that commercializing edge models is also part of capability validation and the data flywheel. Relying solely on commercial paths to sell models and deploy them on hundreds of millions of devices may be challenging. A more practical approach is to promote this process through ecosystems and developers.
Regarding competition with large companies, Li Dahai believes that opportunities for startups have not disappeared with the entry of big players. AI remains an industry-level opportunity. The challenge for startups is whether to occupy a small share in a very broad track or to aim for a leading position in a smaller market. “I believe there is still plenty of room for everyone to play.”
For future technological trends, Liu Zhiyuan highlights two main themes: continuous enhancement of intelligent capabilities and maintaining high efficiency in AI usage. He predicts that in the next one or two years, the industry will rapidly witness models becoming more specialized and their interaction with the world exploding. “As an intelligent agent, it will have stronger autonomous learning abilities, which is a very important development trend in the next one or two years. Once it has autonomous exploration and learning capabilities, the next breakthrough will likely be multi-agent collaboration.”
Liu Zhiyuan states that in the next five to ten years, the world will definitely enter a state of multi-agent interconnectedness, high collaboration, and emergent collective intelligence.
(Article source: Daily Economic News)