Gaode Unveils Groundbreaking Autonomous Robot at Beijing Yizhuang Half Marathon

Gaode

Recently, at the Beijing Yizhuang Robot Half Marathon, the Gaode ABot-Claw made a significant impact! The era of embodied intelligence with the introduction of Harness has arrived.

A four-legged robotic dog can navigate its own path in an open urban environment without pre-set routes or human control. This is the reality demonstrated by Gaode’s “Tutu.” The key to this advancement is an embodied intelligence system called ABot-Claw, which seeks to resolve the “one robot, one map” dilemma. The dawn of embodied intelligence has arrived!

Gaode, a subsidiary of Alibaba, officially unveiled the world’s first fully autonomous embodied robot, “Gaode Tutu,” at the 2026 Beijing Yizhuang Robot Half Marathon. This four-legged robot successfully assisted visually impaired individuals in navigating complex obstacles and crowds, bridging the technological gap between laboratory settings and real-world environments.

We are accustomed to the limitless capabilities of large language models (LLMs) in the digital realm. However, when it comes to physical entities, traditional embodied intelligence often struggles. For instance, if you were to ask a robotic dog to take you to the nearest park to relax, a conventional robot might be confused—it wouldn’t know where the park is, how to get there, or what “relax” means. Even if you provided a route, it would merely follow the instructions, faltering when faced with obstacles like construction.

This “knowledge island” engineering bottleneck is the biggest hurdle for embodied intelligence on the path to achieving general artificial intelligence (AGI). Now, Gaode has officially launched the ABot full-stack embodied technology system, aimed at AGI, along with the intelligent guide dog, Gaode Tutu. This is not just a robot that can navigate and think; it represents a fundamental reconstruction of the underlying logic of embodied intelligence.

Unlike traditional robotic dogs, Gaode’s Tutu first understands your intention—whether you’re tired or want fresh air. It then checks its “memory bank” for the nearest park’s coordinates, breaks the task down into a series of sub-goals, and finally navigates to the destination. If it encounters an obstacle along the way, it adjusts its route in real time, even avoiding groups of people without you noticing. For example, it could even go out to a café and bring back a cup of coffee for you.

The introduction of ABot-Claw, the system’s core, signifies that embodied intelligence has finally obtained its own “Harness” (intelligent control hub). Gaode is leveraging its spatial intelligence foundation to open the doors to AGI in embodied intelligence.

For a long time, the embodied intelligence field has faced an engineering dilemma known as “one robot, one map.” What does this mean? Each robot deployed in a new environment must start from scratch: independently mapping, cold-starting, and training. If a robot learns how to find the meeting room in Office A today, a new robot introduced to Office B tomorrow must begin anew, unable to share experiences or reuse knowledge. This is akin to having to teach every new employee in a company from kindergarten—learning to read, performing basic arithmetic, and understanding the business from scratch, only to repeat the process when someone leaves.

Moreover, individual robots are extremely fragile. They can only model the world immediately around them; any unseen areas remain unknown voids. If an elevator door closes, they have no idea what’s on the upper floor; if someone approaches around a corner, they cannot predict it. Lacking memory, collaboration, and fault tolerance is why most robots can perform basic tasks in closed environments but become ineffective “blind machines” once they leave the lab.

The solution has arrived: transitioning from “individual capability showcase” to “system-level solutions.” Gaode Tutu shatters this deadlock! As a four-legged robot with a “heavenly eye,” it can navigate open environments without pre-set routes or human control. Even when encountering unforeseen changes beyond its line of sight, it can predict road conditions in advance. The secret behind this universal capability lies in its new underlying operating system—ABot-Claw.

If robots are likened to horses, the past approach involved constantly feeding them better feed and providing more training to make them faster and more efficient. The Harness strategy is akin to equipping the horse with reins and a saddle to allow the rider to truly control it. ABot-Claw functions at the Agent level of the ABot technology system, bridging capabilities from the ABot-M0 (operation model) and ABot-N0 (navigation model) to coordinate various forms of robotic bodies, including quadrupeds, wheeled, and humanoid robots.

It is not just another foundational model; it is the “central nervous system” that enables foundational model capabilities to be effectively implemented. Consequently, robots have evolved from being “passive executors” to “active orchestrators.” They are no longer mere tools that follow commands but intelligent agents capable of understanding intentions, planning routes, executing tasks, and self-correcting.

This transition marks the official shift of embodied intelligence from the “individual trial-and-error” era to the “system-level intelligence” stage. It is worth noting that the ABot system’s Model layer has also achieved impressive results. The ABot-M0 operation model has set world records in four major authoritative benchmark tests, with a task success rate of 80.5% in the Libero-Plus benchmark—nearly a 30% improvement over the industry benchmark Pi0. Meanwhile, the ABot-N0 navigation model has unified five core navigation tasks using a single model for the first time globally, achieving state-of-the-art results in seven international evaluations, significantly leading the industry.

All of these capabilities require the central “nervous system” Claw to manage them effectively. How does Claw reconstruct the embodied foundation? If models are the brain, then ABot-Claw is the central nervous system connecting the brain to the limbs, providing long-term memory. It employs three core technological pillars to eliminate the robot’s “amnesia.” Through Map as Memory, centralized dynamic scheduling, and hierarchical fault tolerance mechanisms, it fundamentally ends the “one robot, one map” era.

Map as Memory transforms maps into a persistent memory carrier for the embodied intelligence system. It constructs a cross-embodiment spatial memory system based on a unified global spatial coordinate system, supporting multimodal perception data (visual keyframes, 6D poses of objects, semantic coordinates, behavioral trajectories) for persistent storage in a joint-index distributed memory architecture.

This system supports multiple granularity spatial semantic modeling, hybrid retrieval mechanisms, and context zero-transfer inheritance, enabling new robots to subscribe to the global spatial semantic map, reusing historical observations and task contexts. This is Gaode’s most ambitious innovation and the soul of the entire Claw system.

Traditional robots’ “memories” often consist of fragmented data collected from sensors and task logs. This information is highly disjointed, unable to be shared across devices, and fails to form a persistent cognitive foundation. Gaode’s approach is to elevate the map into an intelligent agent’s persistent memory carrier. Consider how humans “remember” the world: we have spatial memories of places we’ve lived long-term, like knowing where the kitchen is at home or how to navigate to the office elevator, and we have relational memories of the people we interact with, understanding colleagues’ habits and preferences. These memories help us navigate complex environments smoothly, rather than feeling like we’re experiencing everything for the first time each time we step out.

The Map as Memory functionality of ABot-Claw achieves this by establishing a persistent spatial semantic map anchored by a global coordinate system, allowing robots to possess genuine “world memories.” How is this implemented? Claw builds a four-layer visual spatial memory architecture:

  • Block Layer: Defines indoor rooms and outdoor blocks, supporting coarse-grained positioning across areas and long-range task decomposition.
  • Road Layer: Illustrates physical connectivity—intersections, doorways, passages—providing hard constraints for path planning.
  • Function Layer: Labels key semantic nodes—rest areas, kitchens, elevator halls—translating abstract language intentions into navigable functional goals.
  • Object/POI Layer: Locates specific entities—particular shops, specific item locations—serving as precise visual-semantic anchors for “last-mile” navigation.

With this hierarchical topological memory framework, robots can travel from “leaving home” to “navigating through city blocks” to “entering shopping centers” and “finding a specific restaurant’s vacant table,” all supported by a unified spatial memory framework without needing to switch between different systems.

Most importantly, this memory is shareable. Through a globally anchored shared spatial semantic map, new devices can connect to the network and access the global context to achieve “zero-cost inheritance” of knowledge. If one robot discovers that “there’s a water cooler near the third-floor conference room,” a second robot assigned to the same building the next day will instantly know where to go for a drink. This is the core of ending the “one robot, one map” issue—knowledge is no longer confined to a single device but is embedded in a shared world memory that can be inherited, accumulated, and evolved.

This memory system also has a clever aspect: it is dynamically maintainable. Every time a task is executed—whether successfully or not—it will be recorded as new observational evidence in the topological map. Temporary road closures, newly opened shops, and adjusted indoor layouts can be dynamically updated through a continuous “maintenance-feedback” mechanism. The vast amount of navigation data processed by Gaode daily—from satellites, street view vehicles, and crowdsourced probes—will also be injected in real time into this memory system. This means robots can remember not just a “static world,” but also perceive a “changing world.”

Ensuring continuity for long-term tasks is not enough; robots executing real tasks will encounter various unexpected situations: low battery, sensor failures, blocked paths. In a traditional individual architecture, if one component fails, the entire task collapses. The solution from ABot-Claw is centralized scheduling and cloud-edge collaboration. First, Claw introduces the “one runtime, multi-agent symbiosis” paradigm for embodied intelligence. The unified skill abstraction breaks down barriers between heterogeneous robots, allowing robotic arms, humanoids, and quadrupeds to collaborate seamlessly within the same framework. Task contexts can be transferred without interruption—if one quadruped robot runs low on power, another can seamlessly take over the task without needing to re-understand the task or re-plan the path.

For example, two robots can work together to sprinkle pepper on a salad. Secondly, Claw adopts a two-tier structure of “cloud brain-edge response.” The cloud handles high-level task decomposition and planning (L3/L4 Planning), while the edge ensures local high-frequency real-time control (L1/L2 Control), maintaining physical safety and response speed. This division of labor is akin to how the human brain and spinal reflexes work—you don’t need to engage the cerebral cortex every time you blink; instinctive reactions are managed by the spinal cord. When robots encounter unexpected obstacles, they don’t need to wait for a cloud response; the edge side can navigate around them with millisecond-level response times. The advantage of this architecture is that even if the network goes down, the edge can still ensure basic functionality; once the cloud reconnects, it can instantly synchronize the status and continue seamlessly.

Perhaps the most astonishing feature is Claw’s reflection and error correction. Traditional robots execute tasks linearly: receive instructions → perform actions → report results. If an issue arises during execution, they either freeze or report an error, requiring human intervention. This is akin to an actor who can only recite a script; if the audience reacts unexpectedly, the actor doesn’t know how to proceed. Faced with the uncertainties of the real world, Gaode has pioneered the Closed-loop Reflection & Self-Correction mechanism, which endows the system with human-like capabilities for “attempt-judge-adjust.” After each sub-task is completed, the system’s Self-Reflector module evaluates the results. If successful, it continues to the next step; if not, the reflector generates structured failure diagnostic feedback, prompting the planner to re-evaluate.

For example, if a user says, “I’m thirsty,” the robot understands the intent and plans to find a drink at the nearest snack shelf, but upon arrival, it discovers the shelf is empty. A traditional robot might simply fail the task. However, how would Tutu, supported by Claw, respond? The Self-Reflector identifies “no cola at the snack shelf” and provides feedback: “Target location has no desired object, suggest trying the vending machine.” The planner receives this feedback, re-plans the path, and the robot successfully navigates to the vending machine, ultimately retrieving a bottle of cola and completing the task. This human-like “attempt-judge-adjust” loop is crucial for handling edge cases in the real world and is what makes it “smarter” than traditional robots.

Now, robots can easily handle long-term tasks such as visitor reception. Another critical aspect that is often overlooked is social norms. For robots to operate in human society, they cannot simply focus on completing tasks. In a crowded elevator, they need to know to step aside; on a sidewalk with elderly individuals, they should know to navigate around them; when entering a café, they must avoid startling customers. ABot-Claw incorporates reinforcement learning technologies to enable robots to autonomously learn social norms like yielding in elevators and to pedestrians. Gaode has also released the SocialNav social navigation model, specifically training robots to navigate social environments—this achievement received an almost perfect score of 6/6/5 and was selected for an oral presentation at CVPR 2026, significantly leading the industry.

Robots must not only “be capable” but also “know how to interact socially”—this is an essential step for embodied intelligence to transition from the laboratory to the real world. After discussing technical details, let’s step back and look at the bigger picture. ABot-Claw is not just a system that makes robots more functional; it actually defines the underlying architectural form of embodied intelligence’s pathway toward general artificial intelligence (AGI).

AGI evolution: ending reliance on customized deployment. Traditionally, every type of robotic application required customized deployment. Home service robots used one system, logistics robots another, and urban inspection robots yet another. This high development cost and long iteration cycle meant experiences could not be reused. ABot-Claw changes all of that. Through unified skill abstraction and shared spatial memory, Claw enables the capabilities of ABot-M0 (operation model) and ABot-N0 (navigation model) to be reused across different scenes and forms. The same “brain” can drive quadruped robots for outdoor inspections, robotic arms for sorting in warehouses, or wheeled robots for shopping mall navigation. This means that robotics manufacturers no longer need to develop from scratch for every scenario; they can quickly adapt based on the “plug-and-play” intelligent base provided by Claw. It also means that experiences accumulated in one scenario can be transferred to others, creating a positive feedback loop.

The official term for this from Gaode is the “universal brain + specialized body” industrial standard. The brain is universal and can be continuously upgraded; the body is specialized to meet the needs of different scenarios. By decoupling the two, they can evolve independently, maximizing efficiency. From “individual tools” to “universal agents,” the Harness achieves the AGI standard pathway.

The strength of the ABot system lies in its closed-loop “feedback effect.” The three-tier structure of the ABot system—DATA layer (ABot-World world model), MODEL layer (ABot-M0/N0 foundational models), AGENT layer (ABot-Claw intelligent system)—forms a direct closed loop. ABot-World generates vast amounts of high-quality training data that covers diverse real 3D environments, including home settings, commercial spaces, urban streets, and industrial scenarios. ABot-M0/N0 trains based on this data, developing powerful operational and navigation capabilities. ABot-M0 has a task success rate of 80.5% in the Libero-Plus benchmark, a nearly 30% improvement over the industry benchmark Pi0; ABot-N0 has achieved state-of-the-art results in all seven authoritative evaluations. ABot-Claw translates model capabilities into actual task execution abilities, with Tutu operating continuously in open environments. The data accumulated from real operations—successes, failures, and edge cases—flows back into the world model and memory bank, guiding the next round of data generation and model training. This cyclical process leads to a spiral increase in capabilities.

This is why Gaode confidently states that Tutu is not just a product launch; it marks the beginning of a system. Today, it can fetch coffee, deliver packages, and guide the visually impaired. Tomorrow, it may tackle more complex tasks—because it learns in the real world every day and grows stronger.

AMAP-AI Inside: An Open Ecosystem Reducing Redundant Development Costs

Moreover, Gaode’s goal is not merely to create a robotic dog named “Tutu,” but to become a provider of the intelligent foundation for the smartification of the physical world. On March 31, 2026, Gaode announced the full open-sourcing of ABot-M0, encompassing three dimensions: data, algorithms, and models. Currently, the largest universal robot dataset, UniACT—which integrates over 6 million real operational trajectories and more than 9,500 hours of training data across over 20 robot forms—is publicly accessible. Core technologies, including Action Manifold Learning (AML) algorithms and dual-stream perception architectures, are also open-sourced. Recently, the Gaode team has opened ABot-PhysWorld as a baseline for the World Arena competition.

Clearly, just as AWS is to cloud computing and Android is to mobile internet, Gaode aims for the ABot system to become the electricity and water of the embodied intelligence era. The benefits of open-sourcing are obvious: reducing redundant development costs across the industry and attracting developers to collaboratively build on the same “Harness language.” As more robots operate within the same system, the shared world memory will become richer, and each new robot will benefit from collective intelligence.

The Paradigm Shift in Embodied Intelligence has unfolded over the past two years, showcasing a frenzied “model arms race.” Yet, beneath the excitement lies an awkward truth: models that perform brilliantly in laboratories often falter in real-world environments. Where does the problem lie? It is not that the models are inadequate; rather, there is a crucial system architecture layer missing between “model capability” and “task completion.” It’s like having a Ferrari engine installed in a bicycle frame—no matter how powerful the engine, it won’t move.

The emergence of ABot-Claw signifies a fundamental shift in industry thinking: moving from “rolling out models” to “rolling out applications.” It addresses a more essential question: how to enable robots to truly become “social beings”? Capable of remembering visited places, understanding vague instructions, self-adjusting after failures, collaborating with peers, and adhering to human social norms. These capabilities cannot be achieved merely by piling parameters.

Gaode occupies a unique position in this paradigm transformation. Its long-term accumulation of spatial intelligence data has given it the richest spatial semantic assets globally. With billions of navigation requests daily, it deeply understands the complexities of “getting from A to B.” While other players are starting from scratch to build spatial understanding capabilities, Gaode has internalized these abilities into the “innate memories” of its robots.

Even more astutely, the open-source strategy of the ABot system has recently been fully embraced. Gaode clearly recognizes that the field of embodied intelligence requires collective effort across the industry, and its role is to establish the crucial system framework. As more robots operate within the same system, Gaode’s world memory will continue to expand, building higher competitive barriers.

Intelligent companionship for the elderly living alone, guiding partners for visually impaired individuals, reliable last-mile delivery regardless of weather conditions, and secure factory collaboration—these scenarios are no longer just visions on PowerPoint but are becoming concrete realities. When robots cease to be isolated tools and instead become intelligent network nodes that share memories and evolve collaboratively, the future of “please get me a half-sweet latte from next door” is truly within reach. And this step begins with the first step taken by Tutu.

Original article by NenPower, If reposted, please credit the source: https://nenpower.com/blog/gaode-unveils-groundbreaking-autonomous-robot-at-beijing-yizhuang-half-marathon/

Like (0)
NenPowerNenPower
Previous April 24, 2026 11:44 am
Next April 24, 2026 3:18 pm

相关推荐