The Rise of Embodied Intelligence: Insights from the Spectacular Spring Festival Robot Showcase

The Rise of Robots at the Spring Festival Gala: A Reflection on the Industry and Its Divergent Perspectives

In 1950, Alan Turing planted the seeds of “embodied intelligence” in his work “Computing Machinery and Intelligence.” Seventy years later, this concept found new branches with the surge of ChatGPT and the advent of VLA, as embodied intelligence began to replace traditional narratives of “automation,” establishing itself as a new industry consensus. As a new form of “AI realized in reality,” robots have become the favored choice of this era.

At the recently concluded Spring Festival Gala, several companies specializing in embodied intelligence, including Unitree Technology, Songyan Power, and Galaxy General, made a collective appearance, achieving an unprecedented level of public science popularization. Reports indicated that within two hours of the gala’s broadcast, the search volume for JD.com’s robots skyrocketed by over 300%, with order volume increasing by 150%.

However, this was not merely a well-received performance; it also marked a brutal transition driven by fierce competition. In the capital markets, this was an unprecedented celebration: the annual financing scale in the embodied intelligence sector surged to 744 instances, with funding amounts reaching 73.543 billion yuan. Yet, behind this prosperity, the industry is also experiencing growing pains. On one side, giants like Tesla and UBTECH are accelerating iterations and expansions globally; on the other side, star startups like K-Scale have regrettably exited the market, and once-promising unicorns like Datan Robotics have quietly fallen. The soaring valuations juxtaposed with restrained shipping volumes create the truest tension within embodied intelligence.

From the Spring Festival Gala to factory workers, embodied intelligence has dominated the visual center in an unprecedented manner. Unitree Technology’s G1 robot electrified the audience with its performance in “Wubot,” showcasing remarkable athletic limits with single-leg backflips and jumps up to two to three meters high. Songyan Power’s “Bionic Cai Ming” achieved lifelike makeup and lip-syncing through pixel-level replication. Magic Atom’s MagicBot Z1 transformed into a dance troupe, performing complex maneuvers alongside stars. From the synchronized dance of hundreds of panda robot dogs at the Sichuan Yibin branch to the scenario demonstrations by Galaxy General and Zhui Mi, the high concentration of robots led to netizens humorously dubbing it the “First AI Spring Festival Gala.”

Fourteen years ago, robots made their debut at the Spring Festival Gala, performing simple movements as background dancers. Today, they not only take center stage but also, through deeply evolved perception and interaction, have emerged as undeniable stars of the event.

More profound transformations are occurring behind the scenes in factory workshops. At the beginning of 2026, Zhiyuan Robotics announced that it had crossed the threshold of 5,000 units produced, aiming for an annual target of tens of thousands. Its “Expedition” series has accumulated over 1 million hours of work on automotive and precision electronics production lines. UBTECH announced plans for the production of 10,000 industrial-grade robots and signed a strategic agreement with Airbus, with the Walker S2 officially entering manufacturing facilities to challenge aerospace-level precision assembly. Star Motion Era partnered with SF Technology to scale operations within high-frequency warehousing, turning the advantages of “foot-based + wheeled” systems into improved logistics efficiency.

The industry’s heat quickly spilled over into the capital market. Magic Atom’s co-founder Gu Shitao revealed that the company could have news of its secondary market listing by 2026, and the newly restructured Leju Intelligent and Yunshen Technology have also officially initiated their listing processes. After aggressively laying out large models in 2024, internet giants like Meituan, Alibaba, JD.com, and Tencent are expected to collectively enter the embodied intelligence sector in 2025, with advanced manufacturing and industrial giants like CATL and major automotive manufacturers also placing their bets.

From laboratory demos to factory orders, from capital narratives to commercial realization, embodied intelligence seems to have crossed the critical threshold of technical validation, charging full speed toward the eve of large-scale production.

Policy momentum has shifted from macro guidance to precise entry. By the end of 2025, the Ministry of Industry and Information Technology and three other departments released the “Implementation Plan for the Digital Transformation of the Automotive Industry,” clearly stating the aim to promote the large-scale application of intelligent robots in welding, painting, and assembly processes, as well as to create “embodied intelligence demonstration production lines.” However, a significant gap exists between ideals and reality. Jiang Lei, chief scientist at the National Local Collaborative Human-Robot Innovation Center, admitted that the industry is currently more focused on “consumer-grade product reserves,” with annual production not exceeding 10,000 units, as “producing too many has no utility and creates significant after-sales pressure.” Wang He, founder of Galaxy General, bluntly pointed out that there may be fewer than 1,000 robots currently operating in human work scenarios worldwide. Although Tesla’s Optimus V3 is confirmed for release in Q1 of this year, with ambitions of 100,000 units by the end of the year and a long-term goal of 1 million units at a target price of $20,000, its timeline has already been pushed back by approximately eight months compared to the original plan. The core bottlenecks include the mass production stability of the 22-degree-of-freedom dexterous hand under extreme conditions and engineering challenges related to liquid cooling during high-power operations.

The excitement in capital markets intersects with the industry’s anxiety, producing a “rift.” This division is fueled not only by the discussions ignited by the Spring Festival stage show but also by the lack of consensus in the choices regarding hardware, algorithms, and commercialization paths for embodied intelligence. The paradigm shift is occurring at full speed within this “non-consensus.”

Embodied intelligence refers to giving machines both a “body” and a “brain”: enabling them to perceive the physical world through sensors, understand their environment using algorithms such as large models, and control joints and motors to complete tasks. In simpler terms, it means allowing robots to “see, hear, and act” like humans. If we abstract it to an “AI operating system with a body,” the foundational layer consists of the physical hardware, responsible for making the machines “move”; the next layer is the algorithmic brain, determining “how they think”; then comes environmental perception, allowing them to “see the world and understand themselves”; and finally, commercial operations, concerned with whether robots can “survive in the real world and generate revenue.”

When it comes to determining “what kind of body to build,” the industry currently follows three routes. UBTECH and Zhiyuan focus on defining the robot’s framework through “industrial precisionism.” They achieve long-term stable operations on automotive manufacturing or precision electronics production lines through fully self-developed core servo systems and precision reducers, trading physical reliability for deep trust in “silicon-based labor” within industrial scenarios. Unitree Technology, Songyan Power, and Zhongqing leverage the scale effects of local supply chains to seek breakthroughs in “performance and cost-effectiveness,” successfully guiding the overall machine cost down from the million-yuan range to tens of thousands or even thousands, thereby lowering barriers and attracting a massive number of developers and enthusiasts to first establish an ecosystem in non-standard scenarios.

Meanwhile, Galaxy General and Yunshen Technology are attempting to prove that “humanoid” is not the only solution for physical tasks. Galaxy General opts for a wheeled chassis with dual arms, focusing primarily on warehousing, retail, and certain heavy-load industrial tasks. In contrast, Yunshen Technology insists on a hybrid four-legged and humanoid design, striving to excel in scenarios such as electric power inspections, tunnel maintenance, and emergency rescues by adapting to various terrains. These route differences also reflect a divergence in commercial philosophy—some insist on a vertical full stack, developing everything from servos, motors, reducers to the complete machine and upper-layer controls to establish long-term barriers and bargaining power, exemplified by UBTECH’s Walker S2; others choose modular openness, building the body as a standard platform with open interfaces to allow third parties to “install the brain and applications,” profiting through volume and ecosystem growth, as seen in Zhiyuan Robotics’ open platform.

Looking further up, the evolution of brain algorithms resembles a history of technical paradigms. Early simulation transfer technologies addressed initial model training costs but faced cumulative errors in long-sequence operations when confronted with real-world friction, deformation, and complex noise, leading to “making more mistakes in reality.” Later, the VLA (Vision-Language-Action) large models, integrated with vast amounts of internet-based data, became mainstream, granting robots excellent semantic understanding and task decomposition capabilities. From Google’s RT-2 to Physical Intelligence’s π series, to GEN-0 and GR00T, VLA models significantly lower the barriers for human-robot interaction. VLA excels at intertwining complex image and language information, deducing actions according to learned “patterns.” However, their structural shortcomings become evident: when dealing with detailed physical operations and tactile feedback, VLA often struggles to predict outcomes accurately, such as “placing a cup on the edge of a table” without it slipping off or spilling.

Professor Zhao Mingguo from Tsinghua University’s Department of Automation believes that the industry’s enthusiasm for VLA models is more of a transitional technical means rather than a definitive solution. He noted that the success of large language models stems from the “standardization” and “massive” nature of human language data, whereas the visual and tactile data of the physical world is “incredibly unstandardized,” making it impossible to simply replicate.

Recently, the industry has pointed towards WAM (World Action Model) as a breakthrough. This new paradigm requires robots to simulate physical evolution in an internal imaginative space before actions occur. Recent research from Stanford and NVIDIA, such as Cosmos Policy, indicates the potential for zero-shot (no sample) generalized execution of different tasks by embodied models. This involves training robots with “physical intuition” through video generation models: first learning “how the world will evolve if a certain situation occurs,” and then planning “how I should act” based on that understanding. This “pre-execution simulation” capability has become key to enhancing the success rate of robotic operations. The Ctrl-World model, proposed jointly by Tsinghua University and Stanford, has demonstrated that using zero real-machine data can improve the success rate of following downstream task instructions from 38.7% to 83.4%, with an average improvement of 44.7%. While the potential of world models lies in fundamentally alleviating operational errors, the volume of data, computation scale (NVIDIA’s DreamZero relies on top-tier chip clusters like H100 or GB200 for parallel inference, which is currently prohibitively expensive for independently deployed edge robots), and engineering complexity far exceed previous levels, placing it in a phase of “scientific brilliance” and “engineering exploration.”

These technical path differences also extend to the choice of “intelligence sources”: whether to rely on general large models like GPT-4o or Gemini, or to train indigenous embodied models from scratch, as seen with domestic company Yuanshi Lingji. The emergence of intelligence relies heavily on high-quality data, which falls to the environmental perception layer. Chen Yilun, CEO of Shizhi Hang, has mentioned that the complexity of tasks faced by embodied intelligence requires data volumes for product iteration that are over ten times that of autonomous driving. Wang Qian, founder of Self-Variable, has noted that the industry’s understanding of data is evolving: it’s not about having more data, but rather “having more effective data.” This layer also has two parallel pathways: some teams insist on long-term multi-modal data collection in real factories and server rooms to pursue absolute consistency between data and physical environments. The strength of Tesla’s FSD is not solely due to superior neural network design but also because it has millions of cars on the road, acting as distributed “data collectors” that gather extremely rare long-tail scenarios through “shadow mode” every day.

Additionally, domestic company Qianxun Intelligent’s “Xiao Mo” robot performs critical yet tedious work—autonomously checking the connection status of wiring harnesses and dynamically adjusting insertion force in CATL’s production workshop, significantly increasing daily productivity by three times while maintaining a connection success rate of over 99%, thereby reducing labor costs and production losses. Another group focuses on enhancing high-fidelity physical engine simulation capabilities, attempting to shorten the algorithmic evolution cycle through synthetic data. Galaxy General is one such company; its founder Wang He has mentioned that “in the short term, simulation and synthetic data will still undertake more exploratory tasks, but in the long run, we must achieve a real deployment scale growth of hundreds to thousands of times.”

All technological advancements ultimately seek answers within a commercial closed loop. To B, RaaS (Robots as a Service) is converting expensive hardware investments into standardized productivity rentals, spreading initial R&D costs through scaled operations. Qingtian Rental estimates that the robot leasing market will exceed 1 billion yuan by 2025, with projections for 2026 not falling below 10 billion yuan.

In the To C sector, the brand premium generated by the Spring Festival stage or cultural tourism performances has provided embodied intelligence with its first wave of public recognition and traffic assets. While this initial market education mainly focuses on exhibitions, it also lays the groundwork for future penetration into deeper service scenarios like households.

By 2025, leading companies are indeed experiencing rapid revenue growth: UBTECH’s total order amount for the year is close to 1.4 billion yuan, Unitree Technology nearly 1.2 billion yuan, Zhiyuan Robotics around 700 million to 1 billion yuan, and Galaxy General exceeding 700 million yuan. However, order amounts do not equate to actual deliveries, nor do they guarantee profitability. Although UBTECH’s annual revenue exceeds 1.3 billion yuan, its R&D and operational costs remain high, with reports indicating that its R&D expenses reached 218 million yuan in the first half of 2025, accounting for 35.1% of revenue, while net losses amounted to 439 million yuan during the same period. Unicorns like Zhiyuan and Unitree, despite soaring valuations, face significant cost pressures in scaling production and the immense investment required for after-sales systems, with commercialization still in its early stages. Qingtian Rental CEO Li Yiyan has publicly stated that the entire industry still has very limited capacity, with a global total of just over 10,000 units, indicating that the current “billion-yuan orders” are more pilot attempts for benchmark scenarios rather than scalable demands.

It can be concluded that the current state of “how to build the body, how to train the mind, how to obtain data, and how to run business” in embodied intelligence remains one of “vital non-consensus,” with each dimension deeply intertwined: choosing a cheaper body may necessitate more complex compensations in algorithms; pursuing an extreme world model requires higher data and operational costs. No one can assert, as in the past with NLP, that “Transformer + large parameters + massive text” is the only answer. However, it is precisely this systemic non-consensus that gives embodied intelligence its vigorous vitality, allowing capital to tell highly imaginative stories across any dimension: world models, spatial intelligence, DFOL, RaaS… Wang He, founder of Galaxy General, has candidly stated, “The absence of consensus is a good thing; if everyone reaches a consensus, then the competition comes down to cost, resources, and connections. These are not areas where entrepreneurs excel, making it detrimental to entrepreneurship.” This also allows China to explore its own technical and commercial paths with greater possibilities.

Today, domestically, strides have been made in core components such as harmonic reducers, torque/six-dimensional force sensors, and IMUs, transitioning from nearly complete reliance on imports to achieving 100% domestic configurations. The overall machine cost has been compressed from hundreds of thousands of yuan to tens of thousands or even thousands. Nearly a thousand robotics-related companies in Jiangsu are weaving together an industrial tapestry worth over 170 billion yuan, with numerous hidden champions like Suzhou Green’s harmonic drive, Nanjing Technology, Hengli Precision, and Kunwei Technology densely clustered within a “half-hour supply circle” of fifty kilometers.

This “clustered fusion” not only alleviates supply chain vulnerabilities but also grants developers a sense of “paradigm freedom”: they can pursue extreme reliability with full-stack self-research or rapidly iterate ecosystems with modular openness; they can develop humanoid robots for factory entry or four-legged designs for inspection… Complementing the hardware foundation is the “autonomous shifting” of algorithmic brains. Domestic world models like Zhiyuan’s EnerVerse, Self-Variable’s WALL-A, ZK Fifth Epoch’s BridgeV2W, and Ant’s LingBot-World are racing ahead in application implementation, technical breakthroughs, and ecosystem refinement. They are not merely “replicating OpenAI” but are building a domestic tech stack more attuned to the physical world. In terms of perception and environmental operations, large-scale real-scene data collection, operational platforms, and the RaaS model must deeply adapt to local industrial, urban, and policy environments, necessitating local manufacturers’ leadership. The “domestic substitute” for embodied intelligence is no longer simply about replacing a screw but is fundamentally about autonomously reconstructing the entire technological paradigm based on supply chain advantages. Those who master autonomous and controllable capabilities from components to complete machines, from large models to operational systems will have the qualifications to repeatedly experiment and ultimately open the door to industrial singularity.

As we approach 2026, embodied intelligence is stepping towards “consensus.” Following a cycle of capital exuberance and valuation adjustments, the once hotly debated path divergences are reconciling under the gravity of reality and delivery targets, allowing the industry to gradually distill some “consensus” through the process of refining truth from falsehood.

Consensus One: Form is Not Important; Scenarios Are. The debate over “what robots should look like” has always been fraught with bias. Some insist that humanoid is the ultimate answer because human-designed infrastructures—stairs, doorknobs, workstations, tools—are all designed for humans. Others argue that multi-legged, wheeled, or even spherical forms are more efficient, why be restricted to “looking human”? However, this debate may have missed the point. As futurist Thomas Frey pointed out, there is no “perfect” robot form, just as there is no “perfect” type of transportation—motorcycles, cars, trucks, and tanks each serve their purpose, and no one argues which is universally superior. The robot’s form should serve the scenario, not the other way around. Humanoid robots do have advantages in human-designed environments: they don’t require house modifications, as robots adapt to houses; they don’t need new tool designs, as robots use existing tools. However, when tasks become specific, specialized forms are often more efficient—wheeled robots are faster in warehousing scenarios, quadrupeds are more stable in electrical inspections, and multi-arm designs are more flexible in precision assembly. This consensus on “form diversity” fundamentally acknowledges the complexity of the physical world: no single key can open all locks, and no single form can fit all scenarios.

Consensus Two: Human-Centric, Understanding the Physical World. AI researcher Hans Moravec proposed a famous paradox: it is easy for computers to play chess, but perceiving and walking is extremely difficult. This accurately predicts today’s underlying dilemma of embodied intelligence—we can train AI proficient in Go and language in virtual spaces, but getting it to steadily pick up a cup or walk through a door requires entirely different capabilities. This capability cannot be solved merely by stacking computational power; a profound understanding of “how the body interacts with the physical world” is also essential. Robots do not need to think like humans, but they must understand human behavior logic, intent expression, and safety boundaries in the physical world. “Human-centric” is not just an ethical embellishment; it is a technical necessity: only by understanding human existence can robots truly become collaborative partners rather than cold substitutes.

Consensus Three: Not Replacing, But Liberating. In 1920, Czech writer Karel Čapek first used the term “Robot” in his three-act play “R.U.R.”—derived from the Czech word “robota,” meaning “forced labor” and “slave.” The robots in the story are created to perform all the tedious tasks humans are unwilling to do, thereby freeing humanity to engage in more fulfilling activities. Over a century later, the expectations attached to the word “Robot” have not changed. The future embodied intelligence points not to replacing humans but to maximizing human creativity. By 2026, domestic robots are expected to transition from hundreds to thousands in deliveries, heralding a highly anticipated “year of mass production.” At this pivotal moment in industrial infrastructure, we are about to enter a new era of human-robot collaboration—not replacing, but enhancing; not isolating, but integrating; not ending, but rejuvenating. This is the shared value consensus among all companies in the embodied intelligence field and the ultimate destination for the industry.

Original article by NenPower, If reposted, please credit the source: https://nenpower.com/blog/the-rise-of-embodied-intelligence-insights-from-the-spectacular-spring-festival-robot-showcase/

The Rise of Embodied Intelligence: Insights from the Spectacular Spring Festival Robot Showcase

相关推荐