
Humanoid robots are once again making headlines this year as they return to the Spring Festival Gala. Last year, the Yush Robot, dressed in a bright red outfit, performed traditional dances on stage, significantly boosting interest in the humanoid robot industry. Projections indicate that the domestic shipment of humanoid robots is expected to surge by over 650% by 2025. However, can humanoid robots that only perform dances still attract investors and industry recognition?
In recent discussions with humanoid robot investors and key product leaders, the focus has shifted towards practical applications. One investor candidly remarked, “Robots that only dance can’t sell anymore; they need real-world applications to thrive.” Such companies, often small and hastily assembled, that immediately seek high valuations and funding are typically avoided by serious investors. The transition from being mere “performers” to “practical laborers” raises questions about whether the technical capabilities of these robots can support this shift.
Three primary technological paths for robots are currently competing fiercely: The Figure AI (a US humanoid robot startup) and Zhiyuan’s “General Intelligence” VLA model are being assessed for their potential to handle tasks on factory assembly lines. Additionally, Tesla’s “world model” methodology aims to reduce costs through simulation data, while the Boston Dynamics approach focuses on layered decision-making to ensure long-term operational accuracy. Key challenges such as endurance, stability, and cost must be addressed before mass production, as robots must learn to perform real tasks.
At the upcoming 2025 CCTV Spring Festival Gala, the humanoid robot from Hangzhou Yushu Technology will don its red outfit once more, captivating audiences worldwide. Yushu Technology is partnering with the gala, showcasing the Galaxy General Robot, which is designated as the gala’s embodied large model robot. Viewers might witness stunning acrobatic feats performed by humanoid robots again. This early “sneak peek” has heightened industry interest, with optimistic forecasts for production in 2025. According to data from the Gaogong Robot Industry Research Institute, domestic shipments of humanoid robots are projected to reach 18,000 units in 2025, a staggering increase of over 650% from 2024, with an anticipated rise to 62,500 units in 2026.
On January 22 of this year, a tender notice caught attention. The tendering company was the North China branch of China National Petroleum Corporation, and the winning bid went to Sichuan Tianlian Robot Co., Ltd. This project focuses on adapting humanoid robots for energy refueling tasks at gas stations. This indicates that we may soon see humanoid robots taking on real-world responsibilities in fuel stations.
As the spotlight shifts, industry experts are beginning to contemplate the future of robots. They are no longer just performers on stage; they must prove their worth as “laborers” in factories, construction sites, and logistics warehouses, demonstrating their ability to create real value rather than being regarded as expensive toys. On February 6, a seasoned investor in humanoid robots, who was once a major shareholder, expressed, “Currently, the path to success for any robot company lies in deepening specific application scenarios. Whether in hardware, complete systems, or software development, everything must be integrated with practical applications.” Many investors in humanoid robots share this sentiment: companies that detach themselves from real applications and merely focus on development will eventually be phased out.
“We prioritize whether a company has a viable application scenario; without a developed product, we typically do not consider investing. Companies that are hastily assembled by a few people and immediately seek valuations and funding are usually avoided, as they tend to fail quickly,” he explained. The industry has moved beyond the phase where “robots that can dance sell well.”
Qiu Dicong, founder of Yakobi Robotics, stated in an interview, “No matter how advanced the technology or design is, it ultimately needs to result in a marketable product that generates economic value.” The industry has often been preoccupied with comparing various technological routes, as if a single technological advantage could dominate the market. “However, it becomes clear that technology is just one aspect, and sometimes, it is not even a significant factor in the later stages of development.” While Qiu himself is involved in AI robotic research, he believes that the sophistication of technology does not automatically lead to commercial success. “The focus for the near future will be on practical applications and how to effectively sell products that are recognized by customers.”
Three distinct technological paths are emerging, each offering different visions for the future of intelligent robots. The first path is the VLA (Visual Language Action) model route, which aims at achieving “general intelligence” by enabling robots to perceive through vision and understand language to execute actions directly. Companies like Figure AI and Zhiyuan Robotics are investing in this area. According to Tian Feng, the core characteristic of this path is “relying on massive data training to handle unknown environments and tasks, pursuing an end-to-end single model.” Its strength lies in its semantic understanding capabilities, allowing it to comprehend vague instructions like “clean the table.” However, he also pointed out its limitations: “End-to-end models have high computational demands and require robust hardware for endurance and heat dissipation.”
Since last year, companies like Zhiyuan Robotics and UBTech have showcased humanoid robots performing tasks like “screwing in” within factories. At this year’s CES, several specialized companies, such as SuTong JuChuang, have also entered this space, demonstrating high-stability robotic operating systems. On February 3, a representative from SuTong JuChuang discussed the VLA technology, explaining, “VLA is a paradigm that utilizes the emergent capabilities of large language models for operational intelligence.” However, he noted the hidden challenges: “Simply providing a robot with an image doesn’t allow it to gauge the distance to an object; VLA outputs a series of real-number coordinates and orientations in 3D space, meaning end-to-end VLA still implicitly relies on a significant amount of parameters to resolve spatial perception issues.” Furthermore, when a robot’s hand is about to touch an object, most contact points are obscured by the dexterous hand itself, highlighting the importance of tactile and force feedback.
SuTong JuChuang’s solution involves two main strategies: First, integrating 3D point cloud and tactile information with traditional pure visual VLA. “By effectively utilizing point clouds, our data requirements significantly decrease, skipping the phase of relying on vast data for spatial perception.” Secondly, tactile feedback is integrated as another input modality for VLA.
The second path is the world model route, with Tesla being a prominent representative. This approach involves constructing a simulator of the physical world within the AI system, allowing robots to predict the outcomes of their actions. Tian Feng summarizes it as “instilling an intuitive understanding of physical laws in robots, enabling them to reason and plan for the consequences of their actions.” This path heavily relies on high-quality simulation data but can reduce dependence on costly real-world data once the simulator is established.
The third path is the layered decision-making and hardware-software collaboration route, represented by Boston Dynamics and Zhiyuan Robotics. This pragmatic approach breaks down complex tasks, assigning semantic understanding and sub-task breakdown to large models while traditional algorithms handle positioning, navigation, and precision control functions. Tian Feng points out that the advantages of this modular architecture lie in easier fault isolation and the decoupling of complex reasoning tasks from high-frequency real-time control, ensuring responsive control loops, which is particularly validated in real-world production lines.
However, Dr. Lu Tong believes that these technological routes are not mutually exclusive; various architectures, such as layered structures, 3D scene graphs, and world models, are advancing simultaneously. He argues that VLA end-to-end and world models can coexist and need to develop collaboratively. The selection of technology must consider deployment environments, network conditions, computational support, and other practical factors, without losing sight of real-world constraints.
At the core of these developments is the quest to enhance the “generalization ability” of robots to adapt to diverse scenarios. Qiu Dicong elaborated that the primary goal of robot control is to address the generalization issue. The earliest methods involved model predictive control, allowing robots to deviate from fixed trajectories. This method dynamically associates environmental perception with actions, enabling adaptability within predefined changes. However, it falters when faced with unforeseen circumstances. To overcome this limitation, the VLA model was developed, aiming for robots to understand natural language commands and autonomously complete tasks via visual perception.
The VLA model typically trains on a foundation of large visual language models, supplemented by human operational data to enhance understanding and generalization capabilities. Still, it faces challenges related to data expense, computational demands, and execution speed. Current technological paths fall into two main categories: model-driven methods (like model predictive control, which are stable but have limited generalization) and data-driven methods (including reinforcement learning and imitation learning). The VLA model represents a fusion of the latter two, marking a significant direction toward general-purpose robots.
An expert from SuTong JuChuang stated, “The essence of generalization is interpolation.” By exposing the model to a rich array of scenarios—varying lighting conditions, different table heights, and diverse placements—its ability to make reasonable judgments in unknown environments improves. Nevertheless, this is not sufficient; “data must be clean; the cleaner the dataset, the easier it is for the model to generalize.” He emphasized that both the autonomous driving and robotics fields struggle with “dirty data,” which significantly undermines the generalization capacity of models. The diversity and cleanliness of data are two different matters, and this is a pitfall many practitioners encounter.
He also stressed that enhancing the “lower limits” of AI operating systems poses greater technical challenges and industry significance than merely showcasing their “upper limits.” “Even if a model attempts something 100 times, only a few instances will shine; however, raising the lower limits means enabling a robot to work in a factory for 10 continuous hours without errors, which genuinely adds value.” Lu Tong added that industry demand is shifting from merely seeking quantity in data to focusing on “data diversification” and more convenient collection methods, such as video-based data gathering.
In addition to data, computational deployment is a critical issue. The industry generally agrees that high-frequency local inference is essential for ensuring robot stability. If a system achieves a reasoning frequency of 10 Hertz, it means minor disturbances can be addressed within 0.1 seconds. Conversely, if the system’s frequency is only 2 to 3 Hertz, it would need to wait 0.4 to 0.5 seconds, compounded with delays caused by actuator control and reasoning synchronization issues, significantly affecting task success rates.
According to Xie Tiandi, market director at SuTong JuChuang, the next 3 to 5 years will be critical for the deployment of robots in specific scenarios. The true value of robots lies in supplementing the workforce; the practical experience and insights from humans are invaluable. Robots can learn and emulate the expertise of seasoned professionals, and customers are willing to pay for robotic solutions that can replicate human experience. While current embodied robots may only accomplish half or less of human tasks in the same time frame, they can work at night and during holidays. At last year’s robotics conference, many manufacturing company leaders from Jiangsu and Zhejiang provinces visited specifically to inquire, “Can we purchase robots to establish production lines?” Despite the urgent market demand, there remains a gap between technology and commercialization.
He acknowledged that currently, only entertainment robots capable of singing and dancing can generate stable revenue, while the entire robotics industry is still transitioning from research and development to engineering applications. Nonetheless, the excitement generated by entertainment scenarios has greatly accelerated the development of robots’ practical capabilities. The market demand for robots is increasingly shifting toward practical applications. Users are seeking specific scenarios that achieve a highly closed-loop operation, focusing on three main objectives: reducing production costs, liberating humans from repetitive or hazardous tasks, and providing emotional value in sectors like tourism and commerce. “The emergence of robots fundamentally aims to solve practical problems,” he concluded.
While cutting-edge embodied intelligent technologies are still under development, their stability has not yet reached industrial-grade levels. Truly reliable technologies, such as those seen in assembly lines or home appliances, often go unremarked upon because of their stability. Qiu Dicong noted that factory scenarios are relatively straightforward, with fixed items (like specific screws) and environments, allowing for precision but high repetition. In contrast, the complexity in supermarket scenarios increases significantly, requiring recognition of hundreds of thousands of products with a high demand for understanding. The home environment represents the ultimate challenge for robots: with vast variations in space and items, tasks encompass various complex operations, such as cleaning and cooking, requiring a high degree of generality.
From an ROI perspective, domestic scenarios are currently not economical, as robots can cost tens or even hundreds of thousands of dollars, which does not match the limited services they provide. Commercial scenarios, however, are becoming breakthrough points. For example, in retail warehouse picking scenarios, if robots can address generalization issues, they could enhance operational efficiency by 30% to 90%, establishing clear commercial value. However, Qiu Dicong reiterated that current advanced embodied intelligent technologies are still in the research phase, with stability generally not reaching industrial-grade levels.
The success of specific scenarios will ultimately determine the effectiveness of various technological routes, leading to convergence and competition among domestic technological paths. Tian Feng analyzed, “Long-term stable operation is critical for commercial deployment, and different technological routes dictate the cost-effectiveness and survival rates of robots in various scenarios.” “In relatively structured factory and logistics scenarios, extreme VLA semantic understanding is not necessary; however, there is a need for high mean time between failures (MTBF) and low power consumption, making the ‘layered decision + hardware-software collaboration’ route more suitable.” He further pointed out that “modular actuator solutions offer absolute advantages in production costs and subsequent maintenance.” In complex construction environments, the world model combined with hybrid actuators is more fitting. For instance, through world modeling to predict terrain, robots can automatically switch movement modes to complete tasks, achieving energy efficiency that is 3 to 5 times higher than purely foot-based robots, significantly alleviating endurance pressures during prolonged operations.
In the fields of culture, tourism, and home services, the service industry’s requirements for human-robot interaction are extremely high, and the VLA architecture can impart robots with the ability to comprehend nuanced human instructions. Xie Tiandi believes that the current business model of the robotics industry is becoming clearer, targeting B-end (business) clients while collaborating with original equipment manufacturers and application partners to co-create solutions that validate deployment. “We need to seek partners with real production scenarios, like packaging logistics or automotive component assembly, to collaboratively promote solution validation.” He stated that the core value of robots lies in their ability to operate alongside humans without requiring modifications to existing infrastructures— for example, having humans work during the day and robots take over at night.
Looking forward from the current competitive landscape, the humanoid robot sector is exhibiting several distinct development trends. From a temporal perspective on technological advancements, Lu Tong believes that current robot technologies are rapidly iterating on a “monthly” basis, and the industry continues to maintain a high-speed momentum in both capital and technology. However, the integration of cutting-edge technology with practical applications remains in a maturation and trial-and-error phase. He has also observed that the boundary between academia and industry is increasingly blurring, with many new technologies emerging from frontline feedback and demands.
Tian Feng predicts that technological paths will gradually converge: “Drawing on the hardware development history of PCs and smartphones, the hardware architecture of intelligent robots will become increasingly unified.” In software architecture, “it may no longer pursue purely end-to-end solutions but evolve into a three-layer decoupled architecture comprising a semantic parsing layer, an environment mapping layer, and a motion execution layer.” In terms of corporate strategy, deep hardware-software collaboration will become a priority. “Core components must be deeply compatible with algorithms; companies that simply assemble parts may face obsolescence,” Tian Feng noted. A crucial criterion is that “by 2026, hardware disparities among companies will rapidly diminish, and the true competitive barrier will be the non-standard operational data accumulated during long-term tasks.” Companies that have successfully deployed robots will create data feedback loops that become their main competitive advantage.
Another significant trend is localization. Starting in 2026, domestically produced planetary roller screws and high power density servo motors are projected to gradually replace imports, with intelligent robots integrating domestic components for self-research and optimization becoming the trend.” Tian Feng summarized. Xie Tiandi believes that the ultimate value of robots is not to replace humans but to inherit their experiences. Robots can work during human rest periods and in harsh environments, transforming the skills of seasoned professionals into data models, thus supplementing human labor. “This is the future of industrial intelligence.” Qiu Dicong concluded that while robot technology is crucial for driving productivity, efficiency, and experience, it must be positioned correctly. Technology is the means to achieve excellent products, not the end goal itself. The real test of a robot’s practical application is its full alignment with commercial scenarios. “If you can solve 90% of the problems but fail to address the remaining 10%, the entire scenario becomes unusable, rendering the previous 90% worthless.” This indicates that companies must comprehensively consider whether technological advancement aligns with scenario needs, the robot’s stability and reliability, design and user experience, and whether the overall solution can establish a closed-loop within acceptable ROI parameters. Any detail impacting the final experience constitutes a decisive factor for product strength. Whether in entrepreneurship or new technology, it ultimately boils down to a simple question: “Is this thing useful? And are you willing to pay for it?” If yes, then it is a success.
Original article by NenPower, If reposted, please credit the source: https://nenpower.com/blog/the-future-of-humanoid-robots-moving-beyond-performance-to-practical-applications-in-industry/
