The Rise of Robots at the Spring Festival Gala: Exploring the Industry’s Divergent Perspectives on Embodied Intelligence

The Rise of Robots in the Spring Festival Gala: A Reflection on Industry Discrepancies

In 1950, Alan Turing planted the seed of “embodied intelligence” in his work Computing Machinery and Intelligence. Seventy years later, this seed has grown amid the popularity of ChatGPT and the emergence of Visual Language Models (VLA), leading to a shift from the traditional narrative of “automation” to embodied intelligence as the new industry consensus. As a novel entity serving as a “real-world carrier of AI,” robots have become the darlings of the era.

The recent Year of the Horse Spring Festival Gala showcased a group of embodied intelligence companies, including Yushutech, Songyan Power, and Galaxy General, which collectively dazzled audiences with their high-tech presentations. Reports suggest that during the two-hour broadcast of the gala, the search volume for JD.com robots surged over 300%, with order volumes skyrocketing by 150%.

However, this was not merely a celebration; it was also a brutal transition marked by fierce competition. In the capital market, the excitement was unprecedented: the annual financing scale in the embodied intelligence sector surged to 744 deals, reaching an impressive ¥73.543 billion. Yet behind this glamour, the industry is experiencing growing pains. While giants like Tesla and Ubtech accelerate iterations and expansions globally, the harsh reality includes the exit of star startups like K-Scale and the quiet downfall of once-prominent unicorns like Dalu Robotics. The soaring valuations juxtaposed with restrained shipment volumes reflect the true tension within embodied intelligence.

The Spring Festival Gala saw embodied intelligence take center stage in an unprecedented manner. Yushutech’s G1 robot set the arena ablaze with its stunning performance in WuBOT, showcasing remarkable athleticism with consecutive backflips on one leg and leaps over two to three meters. Songyan Power’s “Bionic Cai Ming” achieved pixel-perfect replication of makeup and lip-syncing; Magic Atom’s MagicBot Z1 transformed into a dance troupe, executing complex moves alongside stars. From the synchronized dance of a hundred panda robotic dogs in the Yibin venue to scenario demonstrations by Galaxy General and Zhuimi, the concentration of robots was so high that netizens humorously dubbed it the “First AI Spring Festival Gala.” Fourteen years ago, robots made their debut at the gala as simple background dancers; now, they not only occupy the main stage but have also evolved in perception and interaction, firmly establishing themselves as the stars of the show.

More profound changes are occurring behind the scenes in factories. In early 2026, Zhiyuan Robotics announced it had surpassed 5,000 units produced and was sprinting towards an annual goal of tens of thousands, with its “Expedition” series having accumulated over one million hours of operation in automotive manufacturing and precision electronics. Ubtech aims to produce 10,000 industrial-grade robots and has signed a strategic agreement with Airbus, with its Walker S2 officially entering manufacturing plants to tackle aerospace-grade precision assembly. Star Motion Era has partnered with SF Technology to push for large-scale implementation in high-frequency warehousing, leveraging the advantages of “foot + wheel” for improved logistics efficiency.

The industry’s heat quickly spilled over into the capital market. Magic Atom’s co-founder Gu Shitao revealed that the company might announce news in the secondary market as early as 2026, with plans for an expedited IPO timeline. Leju Intelligent and Yundong Technology, which have completed share reform, have also officially initiated their listing processes. Following a frenzy of investments in large models in 2024, internet giants like Meituan, Alibaba, JD.com, and Tencent are set to enter the embodied intelligence sector in 2025, with advanced manufacturing and industrial giants like CATL placing their bets as well.

From laboratory demos to factory orders, and from capital narratives to commercial realization, embodied intelligence seems to have crossed the critical threshold of technological validation, racing towards the eve of large-scale production.

Policy support has also shifted from macro guidance to precise entry. By the end of 2025, the Ministry of Industry and Information Technology and three other departments released the Implementation Plan for Digital Transformation in the Automotive Industry, clearly stating the goal to promote the large-scale application of intelligent robots in welding, painting, and assembly processes, and to create “embodied intelligence demonstration production lines.” Yet, a significant gap exists between ideals and reality. Jiang Lei, the chief scientist at the National Collaborative Innovation Center for Humanoid Robots, candidly stated that the industry currently resembles a “consumer-grade product reserve,” with annual production not exceeding 10,000 units due to concerns over excess production leading to high after-sales pressure. Wang He, founder of Galaxy General, bluntly noted that there are likely fewer than 1,000 robots currently operating in human work environments worldwide.

While Tesla’s Optimus V3 is confirmed for release in Q1 of this year, with ambitious production targets of 100,000 units by year-end and one million in the long term, its timeline has been delayed by approximately eight months. Challenges such as the mass production stability of its 22-degree-of-freedom dexterous hand under extreme conditions and the engineering hurdles of liquid cooling during high-power operations remain core bottlenecks.

The intertwining of capital exuberance and industrial anxiety reflects a “disruption” fueled not only by the public interest sparked by the Spring Festival stage show but also by the lack of consensus in hardware, algorithms, and commercialization pathways within embodied intelligence.

Breaking Paradigms Amidst Discrepancies

Embodied intelligence entails giving machines a “body” and a “brain”: enabling them to perceive the physical world through sensors and utilize algorithms like large models to understand their environment, plan actions, and drive joints and motors to accomplish tasks. In simpler terms, it means making robots “see, hear, and act” like humans. If we abstract this as an “AI operating system with a body,” the foundation is the hardware itself, responsible for making machines “move”; the next layer is the algorithmic brain, determining “how to think”; followed by environmental perception, enabling them to “see the world and understand themselves”; and finally, commercialization and operation, ensuring robots can “survive in the real world and make money.”

Currently, the industry has three distinct approaches to building “bodies.” Ubtech and Zhiyuan aim to define robot frameworks through “industrial precisionism.” By developing core servo systems and precise reducers in-house, they strive for long-term stable operations on automotive or precision electronics production lines, exchanging physical reliability for deep trust in “silicon-based labor” in industrial scenarios. Yushutech, Songyan Power, and Zhongqing take advantage of the economies of scale from local supply chains to seek breakthroughs in “performance and cost-effectiveness,” successfully reducing the overall cost of machines from millions to tens of thousands or even thousands, lowering entry barriers to attract a vast array of developers and enthusiasts, initially establishing an ecosystem in non-standard scenarios.

In contrast, Galaxy General and Yundong Technology aim to prove that “humanoid” is not the only solution for physical tasks. The former opts for a wheeled chassis with dual arms, prioritizing entry into warehousing, retail, and certain heavy-duty industries, while the latter insists on a quadruped humanoid hybrid, striving for terrain adaptability in scenarios like electrical inspections, pipeline tunnels, and emergency rescues.

This divergence in approaches also reflects a philosophical divide: some insist on a vertical stack, developing everything from servos, motors, and reducers to the complete machine and upper control and big models to secure long-term barriers and bargaining power, exemplified by Ubtech’s Walker S2; others choose a modular open approach, designing the body as a standard platform with open interfaces, allowing third parties to “install brains and applications,” relying on shipment volumes and ecosystems to generate revenue, as seen with Zhiyuan Robotics’ open platform.

Looking further, the evolution of algorithmic intelligence is almost a history of technological paradigms. Early simulation transfer technologies reduced initial model training costs, but when faced with the complexities of the real physical world—friction, deformation, and noise—they often accumulated errors over long sequences, leading to “more mistakes in reality.” Later, the VLA (Visual-Language-Action) model, which integrated general internet corpuses, became mainstream, endowing robots with excellent semantic understanding and task decomposition abilities. Models like Google’s RT-2, Physical Intelligence’s π series, and others significantly lowered the barriers to human-robot interaction. VLA excels at weaving complex visual and linguistic information, predicting actions based on learned “patterns.” However, its structural limitations have also emerged: it often struggles to predict outcomes accurately in detailed physical operations and tactile feedback, such as “placing a cup on the edge of a table without letting it slide off or spill.” Professor Zhao Mingguo from Tsinghua University’s Department of Automation views the industry’s enthusiasm for the VLA model as a transitional technology rather than a definitive solution, noting that the success of large language models stems from the “standardization” and “massiveness” of human language data, while visual and tactile data in the physical world remains “very unregulated” and cannot be simply replicated.

Recently, the industry’s breaking point has shifted towards the World Action Model (WAM). This new paradigm requires robots to simulate physical evolution in an internal imaginative space before actions occur. Recent research from Stanford and NVIDIA, such as the Cosmos Policy, has demonstrated the potential for zero-shot models to generalize across different tasks, enabling robots to develop “physical intuition” through video generation models. This ability to “rehearse before executing” is becoming key to improving the success rate of robotic operations. The Ctrl-World model proposed by Tsinghua University and Stanford, which uses zero real-machine data, has increased the success rate of following downstream task instructions from 38.7% to 83.4%, with an average improvement of 44.7%.

While the potential of world models lies in fundamentally alleviating operational errors, the required data volume, computing power (NVIDIA’s DreamZero relies on computing clusters of top-tier chips like H100 or GB200 for parallel reasoning, making costs prohibitive for independent robots at the edge), and engineering complexities far exceed previous requirements, placing it in a phase of “scientific brilliance” alongside “engineering exploration.” This differentiation in technological pathways also extends to the choice of “intelligence sources”: whether to leverage general large models like GPT-4o and Gemini or to train native embodied models from scratch, as explored by domestic companies like Yuanli Lingji, has become a contested frontier for different technology teams.

The emergence of intelligence relies on high-quality data, which brings us to the level of environmental perception. Chen Yilun, CEO of Shizhi Hang, mentioned that the complexity of tasks for embodied intelligence necessitates a data volume for product-level iterations that is more than ten times that of autonomous driving. Founder of Self Variable, Wang Qian, has also pointed out that the industry’s understanding of data is evolving: it is not simply about having more data, but rather “more effective data.” Here, two “parallel lines” emerge: some teams insist on long-term, multi-modal data collection in real factories, striving for absolute consistency with the physical environment. Tesla’s FSD is strong not merely because its neural networks are superior, but because it has millions of cars on the road acting as distributed “data collectors,” continuously gathering rare long-tail scenarios through “shadow modes.” For instance, the domestic robot “Xiao Mo” from Qianxun Intelligence performs crucial yet monotonous tasks in CATL’s production line by autonomously detecting wire harness connections and dynamically adjusting insertion forces, achieving a threefold increase in daily workload with a connection success rate of over 99%, significantly reducing labor costs and production losses.

Others focus on enhancing the simulation capabilities of high-fidelity physical engines, attempting to shorten algorithm evolution cycles through synthetic data, as is the case with Galaxy General. Its founder Wang He noted in an interview that “in the short term, simulation and synthetic data will still undertake more exploratory tasks; in the long run, we must increase the real deployment of robots by hundreds or thousands of times.” The advancements in all technologies ultimately seek answers within a commercial closed loop. The RaaS (Robots as a Service) model is transforming expensive hardware investments into standardized productivity rentals, spreading initial R&D costs through scalable operations. Qingtian Rental predicts that the robot rental market will exceed ¥1 billion by 2025 and not be less than ¥10 billion by 2026.

In the To-C sector, the brand premium brought by the Spring Festival stage or cultural performances is accumulating the first wave of public recognition and traffic assets for embodied intelligence. While this initial market education primarily takes the form of exhibitions, it also lays the groundwork for subsequent entry into deeper service scenarios, such as households.

By 2025, leading companies are indeed experiencing rapid revenue growth: Ubtech’s total order amount approaches ¥1.4 billion, Yushutech nearly ¥1.2 billion, Zhiyuan Robotics about ¥700 million to ¥1 billion, and Galaxy General over ¥700 million. However, order amounts do not equate to actual deliveries, let alone profits. Although Ubtech’s annual revenue exceeds ¥1.3 billion, its R&D and operational costs remain high, with reports indicating that its R&D expenses reached ¥218 million in the first half of 2025, accounting for 35.1% of its revenue, while net losses hit ¥439 million. Unicorns like Zhiyuan and Yushutech, despite soaring valuations, face significant cost pressures in scaling production and building after-sales systems, with commercialization still in its early stages. Qingtian Rental CEO Li Yiyan has openly stated that the entire industry’s capacity is still small, with only over 10,000 units worldwide, meaning the existing volume is still in its infancy.

In this context, the current “billion-yuan orders” are largely pilot attempts in benchmark scenarios rather than replicable, scalable demands. It can be said that in every link of “how to build the body, how to train the brain, how to gather data, and how to run the business,” embodied intelligence is in a state of “vital non-consensus,” with deep couplings across various dimensions: opting for cheaper bodies may require more complex compensations in algorithms; pursuing an ultimate world model necessitates higher data and operational costs. No one can assert today, as they did with NLP, that “Transformer + large parameters + massive text” is the only answer. However, this systemic non-consensus also breathes life into embodied intelligence, allowing capital to tell highly imaginative stories in any dimension: world models, spatial intelligence, DFOL, RaaS, and more. Wang He, founder of Galaxy General, has candidly stated, “The lack of consensus is a good thing; if everyone forms a consensus, then it ultimately comes down to cost, resources, and connections—all of which are not the strengths of entrepreneurs and are detrimental to startups.” This leaves greater possibilities for China to explore its own technical routes and commercial pathways.

Today, domestically, there has been a leap from nearly complete reliance on imports for core components like harmonic reducers, torque/six-dimensional force sensors, and IMUs to achieving 100% domestic configuration. The cost of complete machines has been compressed from hundreds of thousands to tens of thousands or even thousands. Nearly a thousand related companies in Jiangsu have woven an industrial landscape worth over ¥170 billion, with many hidden champions in the supply chain, such as Suzhou Lide Harmonic, Nanjing Craft, Hengli Precision, and Kunwei Technology, densely clustered within a “30-minute supply circle.” This “cluster fusion” not only alleviates passive supply chain dependencies but also grants developers a sense of “paradigm freedom”: they can pursue extreme reliability through full-stack self-development or iterate ecosystems quickly through open modules; they can focus on humanoids for factory entry or quadrupeds for inspection tasks.

Intertwined with the hardware foundation is the “self-adjusting” nature of the algorithmic brain. Domestic world models such as Zhiyuan’s EnerVerse, Self Variable’s WALL-A, Zhongke Fifth Epoch’s BridgeV2W, and Ant’s LingBot-World are racing to excel in application implementation, technological breakthroughs, and ecosystem refinement. They are not merely “replicating OpenAI,” but are constructing a domestic tech stack more aligned with the physical world. In terms of perception and operational sustainability, large-scale real scenario data collection, operational platforms, and RaaS models must be deeply adapted to local industrial, urban, and policy environments, necessitating local firms to take the lead. The “domestic substitution” of embodied intelligence is no longer simply about replacing a screw but is about the autonomous reconstruction of an entire technological paradigm based on its supply chain advantages. Those who master the autonomous and controllable capabilities from components to complete machines, from big models to operational systems will have the qualification to repeatedly experiment in this non-consensus space, leading the way to the singularity of the industry.

Moving Towards Consensus in 2026

Today, after experiencing a wave of capital fervor followed by valuation adjustments, the once-contentious divergences in pathways have reached a reconciliation under the gravity of reality and delivery targets. The industry is gradually crystallizing some “consensus” through a process of refining and discarding the false. Consensus One: The form is not important; the scenario is. Debates over “what robots should look like” have been rife with biases from the start. Some insist that humanoids are the ultimate answer, given that human-designed infrastructure—stairs, doorknobs, workstations, tools—is tailored for humans. Others argue that multi-legged, wheeled, or even spherical forms are more efficient, questioning the need to conform to human likeness. Yet this debate may be misframed. As futurist Thomas Frey pointed out, there is no “perfect” robot form, just as there is no “perfect” vehicle—motorcycles, cars, trucks, and tanks each serve their purpose, and no one disputes which is universally superior. Robot forms should serve the scenarios, not vice versa. Humanoid robots indeed have advantages in environments designed for humans, as they require no modifications to buildings or tools. However, when tasks become specific, specialized forms tend to be more efficient—wheeled robots are faster in warehousing, quadrupeds are more stable in electrical inspections, and multi-arms are more flexible in precision assembly. This consensus on “form diversity” essentially acknowledges the complexity of the physical world: no single key can open all locks, and no single form can excel in all scenarios.

Consensus Two: It’s human-centric, understanding the fundamental physics of interactions. AI researcher Hans Moravec proposed a famous paradox: for computers, chess is easy, but perception and locomotion are extremely difficult. This accurately predicts the current underlying dilemma of embodied intelligence—we can train AI to master Go or language in virtual spaces, but enabling it to pick up a cup or walk through a door requires entirely different capabilities. This ability cannot be solved merely by increasing computing power; it requires a profound understanding of “how bodies interact with the physical world.” Robots do not need to think like humans, but they must grasp the logic, intent, and safety boundaries of human actions in the physical space. “Human-centric” is not an ethical ornament but a technical necessity: only by understanding human existence can robots genuinely become collaborative partners rather than cold replacement tools.

Consensus Three: It’s not about replacement but liberation. In 1920, Czech writer Karel Čapek first used the term “Robot” in his three-act play R.U.R. (Rossum’s Universal Robots), derived from the Czech word “robota,” meaning “forced labor” or “slave.” The robots in the story were created to perform all the tedious tasks humans wished to avoid, thus liberating humanity to pursue more fulfilling endeavors. Over a century later, the expectations carried by the term “Robot” remain unchanged. The future indicated by embodied intelligence is not about replacing humans but maximizing human creativity. By 2026, domestic robots are projected to scale up from thousands to tens of thousands of deliveries, heralding the anticipated “year of mass production.” At this pivotal moment in the industrial infrastructure, we are about to enter a new era of human-robot collaboration—not through replacement, but enhancement; not through alienation, but integration; not through an end, but a rebirth. This is the shared value consensus of all embodied intelligence enterprises and the ultimate destination for the industry.

Original article by NenPower, If reposted, please credit the source: https://nenpower.com/blog/the-rise-of-robots-at-the-spring-festival-gala-exploring-the-industrys-divergent-perspectives-on-embodied-intelligence/

The Rise of Robots at the Spring Festival Gala: Exploring the Industry’s Divergent Perspectives on Embodied Intelligence

相关推荐