From Spring Festival Star to Factory Worker: The Rise and Challenges of Embodied Intelligence

From

From Spring Festival Gala Stars to Workshop Colleagues: The Celebration, Anxiety, and Rebirth of Embodied Intelligence

In 1950, Alan Turing planted the seeds of “embodied intelligence” in his work, “Computing Machinery and Intelligence.” Seventy years later, this concept has sprouted alongside the rise of ChatGPT and the emergence of VLA (Vision-Language-Action) models, replacing the traditional narrative of “automation” and establishing itself as a new consensus in the industry. The recent Spring Festival Gala became a vivid illustration of this technological wave. From the hard-hitting interpretations of Chinese kung fu by Yushu Robotics to the performance of “Cyber Grandson” by Songyan Power’s robots, machines have stepped out of the shadows to take center stage.

This event was not just a showcase of applause but a fierce transition marked by intense competition. In the capital market, the excitement was unprecedented: the financing scale in the field of embodied intelligence soared to 744 instances, totaling an impressive 73.543 billion yuan. However, behind the glamour, the industry is also facing growing pains. On one side, giants like Tesla and UBTech are accelerating iterations and expansions globally; on the other, rising star startups such as K-Scale have regrettably exited the market, and the once-promising unicorn, Dalu Robotics, has quietly fallen.

The soaring valuations and restrained output represent the most authentic tension in embodied intelligence. By 2026, domestic robot deliveries are expected to increase from thousands to tens of thousands, marking the anticipated “year of mass production.” Standing at the turning point of industry infrastructure, we are entering a new era of human-robot collaboration—not to replace, but to enhance; not to alienate, but to integrate; not to end, but to give birth to something new.

As a new species representing “AI in real-world applications,” robots have become the darlings of the times. The recently concluded Spring Festival Gala was a platform for their prominent appearances. Yushu Robotics performed alongside humans, flawlessly executing high-difficulty martial arts moves with precision, while Galaxy General showcased adaptable performances in diverse environments. The cool song-and-dance routines by Magic Atom and the “robot grandson” interpretation by Songyan Power significantly contributed to the gala’s success.

Reflecting on 14 years ago, when robots first appeared at the Spring Festival Gala, they were mere simple-motion background dancers. Continuous iterations of large model capabilities have endowed embodied intelligence with smarter “brains” and more agile “minds.” Robots have not only taken center stage but have also become the stars of the gala.

More profound changes are occurring in factory workshops. At the beginning of 2026, Zhiyuan Robotics announced that it had surpassed 5,000 units in cumulative production, aiming for a yearly goal of tens of thousands. Its “Expedition” series has already worked over one million hours on automotive manufacturing and precision electronics production lines. UBTech revealed a production capacity plan for 10,000 industrial-grade robots and signed a strategic agreement with Airbus to deploy the Walker S2 in manufacturing plants, challenging aerospace-grade precision assembly. Xingdong Era partnered with SF Technology to advance large-scale applications in high-frequency warehousing, leveraging the advantages of “footed and wheeled” robots to improve logistics efficiency.

The industry’s fervor has rapidly spilled over into the capital market. Magic Atom’s co-founder, Gu Shitao, revealed that the company could have news in the secondary market as early as 2026, and is working quickly to finalize its IPO timeline. Companies like Leju Intelligent and Yundongchu Technology, which have completed their restructuring, have also officially initiated their public listing processes. Following a frenzied investment in large models in 2024, internet giants like Meituan, Alibaba, JD.com, and Tencent collectively entered the embodied intelligence race in 2025, alongside advanced manufacturing and industry giants like CATL.

From laboratory demos to factory orders, and from capital narratives to commercial realization, embodied intelligence seems to have crossed the critical line of technical validation, speeding towards the eve of large-scale production. Policies have shifted from macro guidance to precise entry. At the end of 2025, the Ministry of Industry and Information Technology and three other departments released the “Digital Transformation Implementation Plan for the Automotive Industry,” which explicitly calls for the large-scale application of intelligent robots in welding, painting, and assembly, aiming to create “demonstration production lines for embodied intelligence.”

However, a significant gap exists between ideals and reality. Jiang Lei, the chief scientist of the national joint innovation center for humanoid robots, candidly stated that the industry is currently more focused on “consumer-grade product reserves,” with annual production not exceeding 10,000 units due to fears of excess capacity and after-sales pressures. Wang He, founder of Galaxy General, bluntly pointed out that there are likely fewer than 1,000 robots operating in real human work environments today. Although Tesla’s Optimus V3 is set to launch in Q1 of this year, with ambitious production targets of 100,000 units by year-end and one million in the long term, its timeline has been delayed by approximately eight months. Core challenges include the mass production stability of its 22-degree-of-freedom dexterous hands under extreme conditions and engineering issues related to liquid cooling during high-power operations.

The intertwining of capital frenzy and industrial anxiety creates a “tear” in the narrative, fueled not only by the sensational performances at the Spring Festival Gala but also by the non-consensus surrounding embodied intelligence’s hardware, algorithms, and commercialization pathways.

Paradigm Shift: Accelerating Amid Non-Consensus

Embodied intelligence essentially gives machines a “body” and a “brain”: allowing them to perceive the physical world through sensors, understand their environment through algorithms like large models, plan actions, and drive joints and motors to complete tasks. In simpler terms, it enables robots to “see, hear, and act” like humans. If we abstract this as an “AI operating system with a body,” the lowest layer comprises hardware, enabling movement; the next layer is the algorithmic brain, determining “how to think”; the layer above is environmental perception, teaching them to “see the world and feel themselves”; and finally, there is commercial operation, addressing whether robots can “survive and earn money” in the real world.

Currently, three routes exist regarding “what kind of body to build.” UBTech and Zhiyuan focus on defining the robot’s skeleton through “industrial precisionism.” They pursue long-term stable operations on automotive manufacturing or precision electronics production lines via fully self-developed core servo systems and precision reducers, exchanging physical reliability for deep trust in “silicon-based labor” within industrial settings.

Yushu Technology, Songyan Power, and Zhongqing leverage the scale effects of local supply chains, seeking breakthroughs in “performance and cost-effectiveness.” They successfully reduced overall costs from over a million yuan to tens of thousands, attracting vast numbers of developers and enthusiasts, initially establishing an ecosystem in non-standard scenarios.

In contrast, Galaxy General and Yundongchu aim to prove that “humanoid” is not the only solution for physical tasks. The former prioritizes a wheeled chassis with dual arms for warehousing, retail, and heavy-duty industrial applications, while the latter insists on a quadruped-plus-humanoid hybrid approach to excel in scenarios like power inspection, pipeline tunnels, and emergency rescue.

This route divergence also reflects varying commercial philosophies—some insist on vertical integration, handling everything from servos, motors, reducers, to the entire machine and upper-level control and large models themselves for long-term barriers and bargaining power, exemplified by UBTech’s Walker S2. Others opt for modular openness, creating a standard platform and opening interfaces for third parties to “install brains and applications,” profiting from sales volume and ecosystem growth, typical of Zhiyuan Robotics’ open platform.

Looking further, the evolution of brain algorithms appears to be a historical iteration of technical paradigms. Early simulation transfer technologies addressed initial model training costs but fell into cumulative errors when facing the complexities of real physical worlds with friction, deformation, and noise. The VLA (Vision-Language-Action) large model, integrating general internet corpuses, has become mainstream, endowing robots with exceptional semantic understanding and task breakdown capabilities. From Google’s RT-2 to Physical Intelligence’s π series, and GEN-0 and GR00T, VLA models significantly lower the barriers to human-robot interaction. They excel at intertwining complex image and language information to deduce actions based on learned “patterns.” However, structural weaknesses become evident when handling intricate physical operations and tactile feedback, such as accurately predicting outcomes for tasks like “placing a cup on the edge of a table” or “avoiding spills while not letting it slip off.”

Professor Zhao Mingguo from Tsinghua University’s automation department argues that the industry’s enthusiasm for VLA models is more a transitional technology than a final solution. He notes that the success of large language models stems from the “standardization” and “massive scale” of human language data, while visual and tactile data in the physical world is “extremely unstandardized,” making direct adaptations challenging.

The recent breakthrough points towards the WAM (World Action Model), a new paradigm requiring robots to simulate physical evolution in an internal imaginative space before actions occur. Recent studies from Stanford and NVIDIA, such as Cosmos Policy, suggest the possibility of embodied models that can generalize to perform different tasks with zero-shot (no samples) training, teaching robots “how the world evolves under certain circumstances” and planning “how to act” based on that understanding. This capability of “rehearsing before executing” becomes crucial in enhancing the success rate of robotic operations.

The Ctrl-World model proposed by Tsinghua University and Stanford demonstrates that using zero real machine data can improve the success rate of task following from 38.7% to 83.4%, with an average improvement of 44.7%. While the potential of world models lies in fundamentally alleviating operational errors, the data requirements, computational scale (NVIDIA’s DreamZero relies on top-tier computing clusters for parallel inference), and engineering complexity far exceed previous standards, positioning it at a stage of “research brilliance” alongside “engineering exploration.”

This technical path divergence extends to the choice of “intelligence sources”: whether to harness general large models like GPT-4 or Gemini, or to train indigenous embodied models from scratch, as seen with domestic companies like Yuanli Lingji. The emergence of intelligence relies on high-quality data, which falls into the realm of environmental perception. CEO Chen Yilun of Shizhi Hang once noted that the task complexity faced by embodied intelligence demands data volumes for product-level iterations that exceed those needed for autonomous driving by over tenfold. Founder Wang Qian of Zivariable has also highlighted a shift in the industry’s understanding of data: more data is not necessarily better; “more effective” data is what counts.

This layer also has two parallel paths. Some teams insist on long-term, multimodal data collection in real factories and server rooms, pursuing absolute consistency between data and physical environments. Tesla’s FSD excels not only due to sophisticated neural networks but also because it has millions of vehicles on the road, serving as distributed “data collectors” gathering rare long-tail scenarios daily through “shadow mode.” For instance, the domestic Xiaomo robot performs the critical yet tedious task of autonomously detecting wire harness connection statuses in CATL’s production workshop, dynamically adjusting plug-in forces, saving the factory tens of thousands in labor costs and losses annually.

Other teams focus on enhancing high-fidelity physical engine simulation capabilities, attempting to shorten the algorithm evolution cycle through synthetic data. Galaxy General exemplifies this approach, with founder Wang He noting that “in the short term, simulation and synthetic data will continue to undertake more exploratory tasks, but in the long term, we must ensure the actual deployment of robots grows by hundreds or thousands of times.”

All technological advancements ultimately seek answers within commercial cycles. The To B (Business) model of RaaS (Robots as a Service) is transforming expensive hardware investments into standardized productivity rentals, distributing initial development costs through large-scale operations. Qingtian Rental estimates that the robot leasing market will surpass 1 billion yuan by 2025, with projections of not falling below 10 billion yuan in 2026. In the To C (Consumer) sector, the brand premium brought by the Spring Festival stage or cultural performances has begun to establish the first wave of public recognition and traffic assets for embodied intelligence. While this initial market education primarily revolves around performances, it also lays the groundwork for future deep-service scenarios in households.

By 2025, top companies are indeed seeing rapid growth in revenue: UBTech’s total order value nears 1.4 billion yuan, Yushu Technology approaches 1.2 billion yuan, Zhiyuan Robotics ranges between 700 million to 1 billion yuan, and Galaxy General exceeds 700 million yuan. However, order values do not equate to actual deliveries, much less profits. Although UBTech’s yearly revenue exceeds 1.3 billion yuan, its research and operational costs remain high. Financial reports indicate that in the first half of 2025, its R&D expenses reached 218 million yuan, accounting for 35.1% of its revenue, while net losses totaled 439 million yuan. Despite soaring valuations, unicorns like Zhiyuan and Yushu face immense cost pressures for scaling production and substantial investments in after-sales systems, leaving their commercialization efforts in early stages.

Li Yiyan, co-founder of Qingtian Rental, publicly stated that the industry’s capacity remains limited, with Zhiyuan having produced over 5,000 units, Yushu over 4,000, and the global total being just over 10,000—still at the starting line. Thus, the current “billion-yuan orders” mostly represent pioneering trials in benchmark scenarios rather than replicable, scalable demands.

In essence, embodied intelligence exists in a state of “vital non-consensus” at every stage—”how to build the body, how to train the brain, how to acquire data, and how to operate commercially.” Each dimension is deeply intertwined: opting for a cheaper body may necessitate more complex algorithmic compensations; pursuing an ultimate world model entails enduring higher data and operational costs. No one can claim, as in the past with NLP, that “Transformer + large parameters + massive text” is the only answer. However, this systemic non-consensus injects vibrant vitality into embodied intelligence, allowing capital to weave imaginative narratives across any dimension: world models, spatial intelligence, DFOL, RaaS… Wang He of Galaxy General has stated, “Lack of consensus is a good thing. If everyone reaches a consensus, the competition will boil down to cost, resources, and connections—none of which are strengths of entrepreneurs, making it detrimental to startups.” This opens up broader possibilities for China to explore its own technical and commercial pathways.

In hardware, the country has transitioned from nearly complete reliance on imports in core components like harmonic reducers, torque/six-dimensional force sensors, and IMUs to achieving 100% domestically produced configurations. This has reduced overall machine costs from over a million yuan to tens of thousands, liberating the industry from overseas dependencies, and allowing for flexible technical route choices based on scenario needs: full-stack self-research for extreme reliability or modular openness for rapid ecological iterations; humanoid designs for factories or quadrupeds for inspections, unbound by specific supply chains.

In algorithms, domestic world models like Zhiyuan’s EnerVerse, Zivariable’s WALL-A, the Fifth Epoch’s BridgeV2W, and Ant’s LingBot-World are racing forward in application deployment, technological breakthroughs, and ecosystem refinement. They are not “replicating OpenAI” but constructing a domestic technology stack more aligned with the physical world. In perception and environmental operation, large-scale real-world data collection, operational platforms, and RaaS models must deeply adapt to local industries, cities, and policy environments, necessitating local firms to take the lead. The “domestic replacement” of embodied intelligence is no longer a simple substitution of screws or chips but a self-reconstruction of the entire technological paradigm based on local supply chain advantages. Whoever masters autonomous controllable capabilities from components to complete machines, and from large models to operational systems, will have the privilege to explore and iterate in this non-consensus realm, ultimately unlocking the door to the industrial singularity.

2026: Embodied Intelligence Approaches Consensus

Now, after enduring a cycle of capital enthusiasm followed by valuation corrections, the once-contentious path divergences have reached a resolution under the gravitational pull of reality and delivery targets. The industry is gradually solidifying some “consensuses” through a process of refining and filtering.

Consensus One: Form Is Not Important; Scenarios Are. The debate over “what robots should look like” has always been rife with bias. Some argue that humanoids are the ultimate answer since human-designed infrastructure—stairs, doorknobs, workstations, tools—are intended for human use. Others contend that multi-legged, wheeled, or even spherical forms could be more efficient, questioning the need to mimic humans. However, this debate may be fundamentally misdirected. As futurist Thomas Frey points out, there is no such thing as a “perfect” robot shape, just as there is no “perfect” vehicle—motorcycles, sedans, trucks, and tanks all serve their purposes without contention over which is “universally superior.” The form of a robot should serve the scenario, not the other way around. Humanoid robots do indeed have advantages in human-designed environments: they don’t require house modifications and can use existing tools. Yet, as tasks become specific, specialized forms often prove more efficient—wheeled robots are faster in storage scenarios, quadrupeds are more stable in power inspections, and multi-armed robots are more flexible in precision assembly. This consensus on “form diversity” fundamentally acknowledges the complexity of the physical world: no single key can open all locks, and no one shape can excel in every scenario.

Consensus Two: Human-Centric Understanding of the Physical World. AI researcher Hans Moravec proposed an influential paradox: while playing chess is easy for computers, perception and walking are extremely difficult. This accurately predicts today’s underlying dilemma in embodied intelligence: we can train AI to master Go and language in virtual spaces, but getting it to stably pick up a cup or walk through a door requires fundamentally different capabilities. This ability is not merely solvable by increasing computational power; it necessitates a profound understanding of “how bodies interact with the physical world.” Robots don’t need to think like humans, but they must grasp human behaviors, intentions, and safety boundaries within the physical realm. “Human-centric” is not an ethical decoration but a technical necessity: only by understanding human existence can robots truly become collaborative partners rather than cold substitutes.

Consensus Three: Not Replacement, But Liberation. In 1920, Czech writer Karel Čapek first used the term “Robot” in his play “R.U.R.”—derived from the Czech word “robota,” meaning “forced labor” or “slave.” The robots in the story were created to complete all the unpleasant tasks humans wished to avoid, thereby liberating humanity to pursue better endeavors. Over a century later, the expectations encapsulated in the term “Robot” seem unchanged. The future embodied intelligence envisions is not to replace humans but to maximize human creativity. By 2026, domestic robots are expected to transition from thousands to tens of thousands in deliveries, marking a pivotal “year of mass production.” Standing at the crossroads of industrial infrastructure, we are about to enter a new era of human-robot collaboration—not to replace, but to enhance; not to alienate, but to integrate; not to end, but to give birth to something new. This represents the value consensus of all embodied intelligence enterprises and the ultimate destination for the industry.

Original article by NenPower, If reposted, please credit the source: https://nenpower.com/blog/from-spring-festival-star-to-factory-worker-the-rise-and-challenges-of-embodied-intelligence/

Like (0)
NenPowerNenPower
Previous February 18, 2026 7:29 pm
Next February 18, 2026 10:03 pm

相关推荐