Breakthroughs are just around the corner in the next two or three years! Li Dahai, co-founder of Mirror Wall Intelligence: A new generation of human-computer interaction has already shown its first signs of dawn.

CryptocurrencySniper · 2026-02-04T19:25:21+00:00

Artificial intelligence is driving a major transformation in human-computer interaction, as traditional turn-based interaction modes suffer from issues like slow response times. All-modal models are expected to enhance interaction capabilities, making robots and smart devices communicate with humans more naturally. Li Dahai and Liu Zhiyuan from Miankang Intelligence emphasize that the future of embodied intelligence lies in the continuous improvement of model capabilities. Challenges facing the industry include privacy concerns and computational power requirements. The future will witness rapid development in both model capabilities and interaction abilities, promoting collaborative work among multiple intelligent agents.

CryptocurrencySniper

2026-02-04 19:25:21

Abstract generation in progress

As artificial intelligence begins to move from screens into the real world, human-computer interaction is experiencing a moment of upgrade.

Whether it’s smartphones, cars, or the accelerating adoption of robots and wearable devices, the traditional turn-based interaction centered around asking and answering is gradually revealing issues such as slow response times, perceptual disconnection, and interrupted context. These inherent flaws in the interaction method are becoming a key bottleneck for AI entering the physical world.

On February 2nd, Li Dahai, co-founder and CEO of Mianbi Intelligence, stated in an interview with media including the Daily Economic News that a new direction for human-computer interaction has already emerged, but a true leap will not happen overnight. Instead, it will gradually occur alongside the continuous improvement of cloud and edge model capabilities. In this process, whether multimodal models can become embodied brains that connect digital intelligence with the physical world is a core industry concern.

Image source: Mianbi Intelligence

Multimodality is not about feature stacking but a paradigm shift in interaction

As AI begins to enter the physical world and drive robots or wearable devices, traditional human-computer interaction models are showing their limitations.

Tsinghua University Computer Science Professor, Mianbi Intelligence co-founder and Chief Scientist Liu Zhiyuan believes that for humans, listening, speaking, and seeing are multi-channel parallel processes. People can speak while continuing to listen and see, and these processes do not hinder each other. However, at the human-computer interaction level, most previous models have struggled to possess this capability. “Once you start speaking, you can’t see, and there are various issues.”

The flaws in this interaction method limit AI’s deep move toward embodied intelligence. Liu Zhiyuan sees human-like, highly natural interaction ability as a key step to making robots and smart terminals more human. “This (multimodal model) and enabling future robots and smart terminals to interact naturally like humans might be closer.”

According to this judgment, embodied intelligence is not an independent branch but an application scenario that demands higher interaction capabilities from models. Liu Zhiyuan emphasizes that in scenarios like embodied and smart terminals, similar models are also needed to better serve humans. He predicts that rapid iteration of embodied intelligence capabilities may not be far off. “If I had to estimate how long, probably just two or three years.”

At the industry level, the integration of edge models and AI hardware is becoming a realistic yet complex challenge.

Li Dahai believes that with the entry of major companies and intelligent agents into devices like smartphones, a new form of human-computer interaction has already shown signs of emergence, but this does not mean a turning point has arrived. He predicts that this leap will not be completed in one go. “Everyone will keep exploring in this direction, and it will be accompanied by continuous improvements in cloud and edge models.”

Even in the currently widely discussed smartphone scenario, the technology itself still faces significant constraints. Li Dahai states that, for example, the underlying model behind Doubao Phone is one of the best in the industry, but its ability to complete complex human tasks has not yet reached an ideal usable state.

Li Dahai further analyzes that, on one hand, pure cloud solutions face privacy issues; on the other hand, resource consumption on the edge, such as computing power, makes deploying multimodal capabilities on phones take longer. He straightforwardly states that the more modalities involved, the greater the resource consumption, which determines the pace differences across various terminal forms.

Currently, smartphone interactions mainly rely on voice and touch, with limited modalities. Li Dahai explains that, taking Doubao Phone as an example, its core breakthrough is enabling the intelligent agent to operate the phone like a human, completing complex tasks on behalf of the user. This addresses the problem of human-like output. The next significant evolution will focus on input methods.

“Currently, the synchronization of context between the phone and humans depends on active operations on the screen. If future phones can directly listen to and view the real world, they can better synchronize and share context with the user.” Li Dahai believes this is a crucial step toward truly intelligent devices, but it will also face dual challenges of power consumption and privacy protection, raising higher requirements for product design.

In contrast, scenarios like cars and robots, with more relaxed resource conditions, are seen by Li Dahai as more promising directions for deploying multimodal models. In the field of embodied intelligence, he believes the current bottleneck is not the physical hardware but the “brain.” Once model capabilities make a breakthrough, embodied intelligence could experience a leap similar to the “ChatGPT moment.”

Industry will quickly witness explosive growth in model specialization and interaction capabilities

In this context, Mianbi Intelligence’s positioning is not focused on any specific product or hardware form but on whether it can continuously produce high-quality models.

In the AI field, the Scaling Law has long been considered an iron law, but debates about whether it will hit a wall have never ceased. Mianbi Intelligence proposed another perspective: the Densing Law, which states that large models have a very short lifespan, with capability density doubling approximately every 100 days. This means that the key is not just developing an excellent model but having the ability to continuously develop outstanding models.

Mianbi Intelligence sees itself as a “lithography machine for large models.” Li Dahai explains that this lithography machine refers to the ongoing training of larger models with higher capability density.

Liu Zhiyuan adds that the logic of the Densing Law is similar to the chip industry: the future trend of large models is to become smaller in size and higher in density. This will drastically reduce model costs and make it more feasible to run on devices closer to users.

Li Dahai emphasizes that commercializing edge models is also part of capability validation and the data flywheel. Relying solely on commercial paths to sell models and deploy them on hundreds of millions of devices may be challenging. A more practical approach is to promote this process through ecosystems and developers.

Regarding competition with large companies, Li Dahai believes that opportunities for startups have not disappeared with the entry of big players. AI remains an industry-level opportunity. The challenge for startups is whether to occupy a small share in a very broad track or to aim for a leading position in a smaller market. “I believe there is still plenty of room for everyone to play.”

For future technological trends, Liu Zhiyuan highlights two main themes: continuous enhancement of intelligent capabilities and maintaining high efficiency in AI usage. He predicts that in the next one or two years, the industry will rapidly witness models becoming more specialized and their interaction with the world exploding. “As an intelligent agent, it will have stronger autonomous learning abilities, which is a very important development trend in the next one or two years. Once it has autonomous exploration and learning capabilities, the next breakthrough will likely be multi-agent collaboration.”

Liu Zhiyuan states that in the next five to ten years, the world will definitely enter a state of multi-agent interconnectedness, high collaboration, and emergent collective intelligence.

(Article source: Daily Economic News)

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.