
New Model Enhances Robots’ 3D Spatial Understanding and Manipulation Abilities
On May 5, 2026, the Zhejiang Humanoid Robot Innovation Center announced a significant breakthrough in the field of robotic spatial intelligence, achieved in collaboration with teams from The Chinese University of Hong Kong and Zhejiang University. They have introduced a 3D spatial understanding and manipulation model named RAM, which provides a new technical pathway to enhance the reliability of robots in complex long-range tasks. The related findings were recently published in the international academic journal Science Robotics.
Artificial intelligence (AI) technologies, particularly those represented by large visual-language models, have improved robots’ abilities to understand natural language commands and break down complex tasks. However, a critical gap remains between “understanding commands” and “executing actions.” Robots must comprehend various aspects of the 3D space, such as the position, orientation, scale, operable areas, and interrelationships of objects, and convert this information into actionable movement constraints.
To address this challenge, the team proposed the RAM model, which draws on the concept of retrieval-augmented generation, equipping large models with a query-able external 3D knowledge base. When performing tasks, the model can retrieve information about object categories, geometric properties, functional planes, and grasp points as needed, thereby compensating for the inherent limitations of visual-language models in 3D spatial understanding. Xuecheng Xu, Chief Technology Officer of the Zhejiang Humanoid Robot Innovation Center, explained this approach.
The research team also developed a spatial understanding question-and-answer evaluation set tailored for robotic operation scenarios. Results indicated that the RAM model outperformed various representative visual-language large models across multiple spatial reasoning tasks covered by the evaluation set. In addition to its core tasks, RAM demonstrated a degree of versatility and scalability.
Original article by NenPower, If reposted, please credit the source: https://nenpower.com/blog/new-model-enhances-robots-3d-spatial-understanding-and-manipulation-skills/
