Domestic Company Dominates Benjie’s Olympics with First Place in Three Critical Tasks of Embodied Intelligence Competition

For a long time, we have been accustomed to the dazzling video demonstrations in the robotics industry, where intelligent agents execute precise and beautiful movements against carefully selected backgrounds. However, the reality can be quite harsh. When robots are placed in real, chaotic, and ever-changing environments, they often quickly reveal their limitations and fail repeatedly.

Simple tasks, such as grasping objects or cooking, may seem effortless for humans, yet they present unpredictable challenges for robots. As people grapple with the stark contrast between impressive demos and clumsy realities, a Chinese company focused on embodied intelligence and general humanoid robot development has delivered a compelling response on the world’s most challenging practical competition stage. Recently, leveraging its self-developed VLA embodied intelligence model, Star Dynamics Era achieved global first-place rankings in three tasks at the Benjie’s Humanoid Olympic Games, known as Benjie’s Olympics.

The latest results from Benjie’s Olympics show that Star Dynamics Era ranked first globally in the task of socks flipping (a silver-level difficulty task), while simultaneously topping the gold-level difficulty tasks of lock picking and orange peeling, completely breaking world records across all three core tasks.

To truly understand the significance of these three first-place finishes, it is essential to recognize that Star Dynamics Era did not just defeat an ordinary competitor. They triumphed over the industry-recognized technology leader, Physical Intelligence (PI), which presented its previously unreleased closed-source model, π*0.6. The outcome was a clean sweep for Star Dynamics Era, making them the only Chinese company to have ever featured in this competition. It is worth noting that most of PI’s results in the field of embodied intelligence come from third-party organizations using its open-source model, which does not reflect its true capabilities. Benjie’s Olympics is an exception; it is the only event where PI has actively participated and publicly competed, deploying its flagship closed-source model. Star Dynamics Era emerged victorious on this stage, securing first place in all three tasks, marking a historic achievement in the embodied intelligence sector.

To comprehend the weight of Star Dynamics Era’s victories, one must first understand the hardcore nature of Benjie’s Olympics. Those familiar with the robotics industry will recognize an unspoken truth: most demonstration performances at press events are carefully designed showcases rather than true representations of capability. The existence of Benjie’s Olympics aims to dismantle this culture of performance. Initiated by former Google robotics expert Benjie Holson, the competition’s sole purpose is to strip away the industry’s flashy veneer, shifting the evaluation system from flashy displays to practical applications. Consequently, it has quickly earned an unofficial title in the industry: the stress test for robotics. In this competition, extravagant demo performances are laid bare.

The value of this event is evident not only in its format but also in its lineup of competitors, which includes the world’s top embodied intelligence companies like PI and Sunday Robotics. As previously mentioned, this is the only competition in which PI has willingly participated, deploying its closed-source flagship model to represent its highest capabilities without holding back. In other words, the results from this event directly reflect the current pinnacle of embodied intelligence globally. Winning first place in such a demanding arena speaks volumes.

The competition consists of 15 practical challenges categorized into gold, silver, and bronze difficulty tasks. Lock picking and orange peeling are gold-level tasks, while sock flipping falls under silver-level tasks. In these challenges, even the slightest error (1–3 mm) can result in failure, testing the robot’s precision and stability. Gold-level tasks are widely considered nearly impossible to achieve, while silver-level tasks, though seemingly commonplace, demand a high level of skill in manipulating flexible objects. Star Dynamics Era achieved first place in all tasks that were widely deemed impossible.

The rules of the competition are also more stringent than those of conventional events, rigorously testing the robots’ autonomy and adaptability. All tasks must be completed autonomously, without human intervention or simulation. Once a task begins, remote control, human intervention, or remote corrections are strictly prohibited; the robot must operate entirely independently in real scenarios. The environment and objects are randomly arranged, with no use of markers or pre-scanned maps, ensuring that each task is conducted under entirely new and unpredictable conditions. This means competitors must rely solely on their true skills, not on pre-prepared answers. As the Physical Intelligence team remarked, “Every task targets the most challenging unsolved problems in embodied intelligence: flexible objects, high-contact operations, and long-sequence autonomy. No other competition can compare.”

Industry experts agree that this is the only competition that enforces generalization rather than mere replication. Most teams struggle to even secure a bronze medal; achieving gold is nearly miraculous. In light of such rigorous standards, competitors have expressed their frustrations: “We spent six months preparing demos; on Benjie’s tasks, we faced a 90% failure rate in just three days. The real world is merciless.” This commentary evokes a mix of humor and sadness.

This context has led to the establishment of a new standard in the industry: passing the Benjie challenge is the hallmark of industrial-grade full-stack capability; failure to do so renders one a mere demo. It is against this backdrop that Star Dynamics Era secured first place in all three tasks: lock picking, orange peeling, and sock flipping. However, the significance of these achievements cannot be understood through the logic of ordinary competitions.

Take the orange peeling task as an example; it demands that the robot simultaneously possess: real-time integration of 3D vision and tactile feedback, collaborative scheduling of LLM task planning and motion control, physical reasoning (including gravity, friction, and deformation prediction), and self-correction capabilities (such as strategies for recovering after an object drops). Any shortcoming in any of these areas could lead to total failure. This is not merely about leading in one technology but is a comprehensive verification of full-stack capabilities under extreme pressure. Achieving first place in a competition where gold is nearly a miracle, Star Dynamics Era’s performance speaks volumes.

In the results announcement of Benjie’s Olympics, it was revealed that Star Dynamics Era surpassed the previous record holder, PI, in the three core tasks of orange peeling, lock picking, and sock flipping by a significant margin. PI was the first top team in this event to secure multiple gold-level task victories. However, the results produced by Star Dynamics Era not only reflect a considerable reduction in execution time but also showcase unique advantages in operational methods and model generalization capabilities.

Orange peeling (gold-level task): First to achieve hand peeling, with a speed increase of 35% over PI. Tasks like orange peeling are very easy for humans but represent a typical high-difficulty operation for robots. A misstep can easily crush the orange or tear the flesh, resulting in task failure. Moreover, this process often requires the coordinated use of dual mechanical arms: one hand must stabilize and hold the orange, controlling the overall force, while the other hand delicately peels along the edge of the skin. During peeling, the amount, direction, and contact position of force need to be dynamically adjusted; any deviation in coordination between the two hands can easily cause squeezing or tearing. Additionally, the orange continues to deform throughout the operation, necessitating real-time sensing of the subtle changes between the skin and the flesh, requiring the robot to adjust its action strategy accordingly. This makes orange peeling not just a simple task but a comprehensive test of visual perception, force control, dual-arm coordination, and real-time decision-making capabilities. In this highly complex flexible operation, the previous record holder, PI, relied on external tools like peelers, ultimately completing the task in 2 minutes and 46 seconds. In contrast, Star Dynamics Era broke free from tool dependence, becoming the first team in this event to achieve completely tool-free, hand peeling, completing the task in just 1 minute and 47 seconds, a 35% faster time than PI.

Lock picking (gold-level task): Overcoming significant visual interference, with a 25% speed increase over PI. Humans often rely on tactile feedback from their hands when picking locks, while robots depend heavily on visual guidance for high-precision operations. Lock picking is a typical fine task with virtually zero tolerance for error; the glint of the metal surface can create noise interference in the robot’s visual sensors. This requires the AI model to not only extract the lock hole location from the reflections but also accurately calculate the 3D orientation of the key. In this needle-threading task, PI’s completion time was 66 seconds. Star Dynamics Era demonstrated superior high-contact operation capabilities, successfully picking the lock in just 49 seconds, achieving a 25% increase in overall operational speed.

Sock flipping (silver-level task): Training samples were 32% fewer than PI, with a 30% faster execution speed. In robotic control tasks, grasping hard components typically relies on a three-dimensional coordinate system, but this logic fails in the face of “sock flipping.” As a flexible object operation scenario, its core challenge lies in irregular deformation. In every frame of the flip, the physical form of the sock changes. The robot must continuously track the dynamic deformations and accurately distinguish the inside and outside of the sock and the opening position. This means that the model must genuinely understand the piece of fabric in front of it, possessing knowledge of physical laws rather than merely reciting action instructions. To accomplish this task, PI utilized 176 samples, taking 1 minute and 33 seconds. In contrast, Star Dynamics Era exhibited remarkable few-shot learning capabilities: they completed the task using only 120 samples (a significant 31.8% reduction compared to PI), successfully finishing in just 1 minute and 4 seconds, 30% faster than PI.

Star Dynamics Era’s breakthroughs are not merely about speed; they showcase the model’s specific advantages in tackling complex tasks: the pure hand peeling of oranges without relying on external tools demonstrates the model’s understanding of flexible object deformation; the smooth execution of the lock picking task highlights the stable coordination between high-precision perception and actions; and completing the sock flipping task with fewer training samples illustrates greater efficiency in data utilization and generalization ability. The overall performance across these three tasks validates the technical superiority of its self-developed embodied brain in real-world scenarios.

What makes Star Dynamics Era’s VLA model successful in top-tier competitions for embodied intelligence? In the evolution of embodied intelligence, the VLA (Vision-Language-Action) model has emerged as a mainstream paradigm. Its core vision is to break down barriers between visual perception, language understanding, and action control, achieving deep representation fusion among the three. Although VLA provides a unified architecture, agents still struggle to handle detailed operations like “sock flipping” and “lock picking” in practical applications. The challenges of such tasks lie in the complex requirements for three abilities: effective knowledge transfer capability, stable perception under dynamic adaptive conditions, and a real-time decision-making feedback loop.

To address these challenges, Star Dynamics Era’s VLA model has achieved significant breakthroughs in key areas such as data utilization, perception accuracy, and control response through foundational architecture optimization. First, the sample efficiency is exceptionally high; in embodied intelligence research, data is often the most scarce resource. This is particularly true in scenarios involving flexible object manipulation, where collecting and annotating high-quality data is costly, making the model’s dependency on data scale a long-standing bottleneck. In the sock flipping task, Star Dynamics Era achieved performance that met or exceeded PI’s results with only 120 training samples, reducing the sample size by about 32%. This is akin to someone needing to memorize 1,000 words to pass a test while another only needs 700. The key lies in the foundational model’s knowledge transfer capability, allowing it to extract general rules from massive pre-training data rather than starting from scratch for each task. This ability enables the model to adapt quickly to new scenarios under limited sample conditions, granting it a degree of cross-task generalization capability.

Secondly, the adaptive visual attention mechanism is crucial; in embodied intelligence tasks, perception capabilities are often underestimated, yet they determine success or failure. This is especially true in precise operations like lock picking, where the keyhole is often only millimeters in size, compounded by varying lighting, metal reflections, and perspective discrepancies. These factors can significantly affect the stability of visual recognition. If perception deviates, subsequent action execution is almost guaranteed to fail. In this regard, Star Dynamics Era has introduced an adaptive visual attention mechanism, allowing the model to dynamically focus on key areas in complex environments, enhancing features of small targets like keys and lock holes. The model no longer treats all information equally but can concentrate on the most critical details at decisive moments. This capability ensures stable target recognition and alignment in high-interference environments, providing a reliable perceptual foundation for millimeter-level precision operations. Such abilities are key factors in the robot’s success in executing tasks faster and more stably.

Finally, asynchronous high-frequency inference and short-time domain planning are essential. In embodied intelligence tasks, the robot’s reaction speed often directly determines task success. However, real-world environments are dynamically changing: object positions can shift, and forms can alter. If action execution becomes disconnected from environmental conditions, errors can quickly accumulate, leading to task failure. In traditional VLA models, control strategies typically generate a long motion trajectory (often exceeding 1 second) at a fixed frequency. During this trajectory execution, the model cannot make timely adjustments based on environmental changes, meaning any deviation can only be corrected at the next planning cycle, resulting in lag. To address this issue, Star Dynamics Era has implemented asynchronous inference and short-time domain planning mechanisms: while the current trajectory is still being executed, the next segment of the trajectory is predicted simultaneously, allowing the system to switch execution immediately upon generating a new trajectory. This significantly increases the model’s decision frequency, enabling the robot to continuously correct its movements at a higher frequency, responding more promptly to unexpected disturbances (such as changes in sock form), effectively reducing error accumulation and significantly improving task success rates and overall stability.

With the support of these model capabilities, Star Dynamics Era’s VLA embodied model has achieved simultaneous enhancements in key competencies such as flexible object manipulation, dual-hand collaboration, tool utilization, and long-range complex tasks. Each of these capabilities is not uncommon; the challenge lies in running them all cohesively and stably within a single model. This is why the Benjie’s Olympics tasks are so demanding, yet Star Dynamics Era could secure first place in all three tasks. Winning at such a level declares that Star Dynamics Era has explored a methodology more effective than that of its top international competitors in addressing the core challenges of embodied intelligence.

Star Dynamics Era, through technological innovation, is leading a global research paradigm in embodied intelligence. In fact, Star Dynamics Era has already established a leading position in the global research paradigm for embodied intelligence. In February of this year, the founding team of Star Dynamics Era, led by Chen Jianyu, collaborated with Chelsea Finn‘s team at Stanford University (who is also the founder of the previous record holder, PI) to jointly release the Ctrl-World controllable generative world model. This model defeated top models from Google and NVIDIA in the authoritative evaluation ranking, World Arena, excelling in consistency of subjects, trajectory accuracy, depth accuracy, and strategy evaluation consistency across four core dimensions, achieving the global first in embodied task capabilities. Star Dynamics Era has repeatedly set industry benchmarks in leading embodied model technology: they are the first team globally to propose a frequency-divided VLA architecture (launching the HiRT fast-slow layered architecture in September 2024), ahead of giants and star companies like PI, Figure, Google, and NVIDIA. They have also developed the world’s first embodied brain that integrates world models. In December 2024, Star Dynamics Era will release the VLA algorithm framework VPP (Video Prediction Policy), which has been open-sourced and expands the usable data for embodied intelligence to vast amounts of internet video data, enabling robots to think while acting. Their developed ERA-42 model is one of only four models worldwide that have achieved precise control of full-sized humanoid robots and five-finger dexterous hands (the other three being Figure Helix, Tesla Grok, and NVIDIA GR00T). Currently, Star Dynamics Era’s embodied brain, ERA-42, is truly operational in real scenarios across logistics (sorting and scanning), manufacturing (component grasping, high-precision assembly, quality inspection), and commercial services, with efficiency levels in some scenarios reaching 70% to 80%.

For a long time, the discourse on embodied intelligence technology and the most impressive demos have often been dominated by Silicon Valley giants. However, Star Dynamics Era has repeatedly demonstrated through practical results that the competition in robotics technology is not about demos, but rather about whose foundational architecture is more robust and whose generalization ability in real environments is stronger. In this crucial battleground for the industry’s future, domestic companies are leading the forefront of innovation.

Original article by NenPower, If reposted, please credit the source: https://nenpower.com/blog/domestic-company-dominates-benjies-olympics-with-first-place-in-three-critical-tasks-of-embodied-intelligence-competition/

Domestic Company Dominates Benjie’s Olympics with First Place in Three Critical Tasks of Embodied Intelligence Competition

相关推荐