Swiss Team Develops Diffused Orientation Fields to Enhance Robot Manipulation of Curved Objects

Peeling, slicing, and surface detection are tasks that have posed challenges for robots, particularly in handling curved objects such as bananas and irregularly shaped pears. The difficulty arises from the absence of a unified reference coordinate system for curved objects. While robots can navigate simple two-dimensional surfaces with basic directional commands, the orientation on a curved surface varies at every point. A team from the École Polytechnique Fédérale de Lausanne (EPFL) and the Idiap Research Institute has recently published a paper in Science Robotics, proposing a solution using Diffused Orientation Fields to address this issue. Essentially, they introduce a smoothly varying local coordinate system across the robot’s entire workspace, enabling the robot to understand “what is along the surface” and “what is near the object” at any position.

1. Point Clouds and PDEs: Validating Shape Adaptability with Deformed Pears

Traditional methods either rely on clean 3D mesh models or require extensive training data. This team opted for a different approach: starting from point cloud data captured by a depth camera and utilizing diffusion equations from partial differential equations (PDEs) to compute the orientation field. They marked several key points on the object’s surface, such as the ends of a banana, and solved the diffusion equations to allow the directional information from these key points to “diffuse” across the surface, creating a smooth orientation field. This process does not necessitate a complete mesh; point clouds are sufficient. Moreover, they extended the surface’s orientation field into the entire three-dimensional space using a Monte Carlo method known as Walk on Spheres, which allows querying without discretizing the space into a grid. This significantly enhances computational efficiency, enabling real-time updates.

A rigorous comparative experiment demonstrated their approach’s effectiveness. Using pear models from the YCB dataset, they randomly generated 50 deformed versions—some elongated, some flattened, and some distorted. The robot was tasked with peeling using various coordinate representation methods, including a single object coordinate system, cylindrical coordinates, spherical coordinates, and multiple local coordinate systems. The results revealed that their method yielded the smallest standard deviation in motion trajectories, maintaining a periodic peeling pattern across all directions. Interestingly, as they increased the number of local coordinate systems, the variance of the baseline methods converged toward their continuous orientation field, validating that their method essentially acts as a continuous version of multi-coordinate systems.

2. Universal Representation Layer: Integrating Teleoperation, Trajectory Planning, and Reinforcement Learning

The orientation field is not designed for a specific controller but serves as a general intermediate representation layer. The paper outlines three methods of integration. In a teleoperation scenario, when controlling a robotic arm with a 3DConnexion Space Mouse, the input axes directly map to the local coordinate system. When the operator moves along the x-axis, the robot glides along the object’s surface; moving along the z-axis brings it closer to or further away from the surface. The tool’s pose automatically aligns, facilitating intuitive operations. In trajectory optimization, they defined a cost function using the orientation field, allowing the optimizer to plan trajectories that maintain surface distance while avoiding obstacles. Notably, the orientation field enables a “warm-start,” initializing the trajectory along the local coordinate system’s x-axis, resulting in convergence after just one iteration. Without this feature, it typically requires five to six iterations.

The reinforcement learning experiments were particularly intriguing. The team trained a policy on a 2D circle to reach a target while maintaining distance, then transferred it zero-shot to a 2D rectangle and 3D point clouds. Strategies trained in a global coordinate system did not transfer, while those trained in local coordinate systems did seamlessly. This indicates that the geometric scaffolding provided by the orientation field indeed reduces the learning complexity.

3. Diffusion Time Parameter τ: A Knob for Smoothness and Noise Robustness

In real-world scenarios, point cloud data inevitably contains noise, and key point extraction cannot be perfect. The researchers conducted three sets of controlled experiments: topological noise (removing half the point cloud and randomly creating ten 5 mm holes), geometric noise (adding Gaussian noise with a standard deviation of 3 mm to point cloud coordinates), and key point noise (adding 20 mm standard deviation noise to key point locations). Each experiment was repeated 50 times, measuring the root mean square error (RMSE) of generated trajectories against noise-free reference trajectories. The results were as expected: the smoothing properties of the diffusion equation naturally suppress high-frequency noise. A larger diffusion time parameter τ yields a smoother orientation field, enhancing robustness against noise. Short-term diffusion approximates the gradient of geodesic distances, retaining more local geometric details, while long-term diffusion extracts global symmetries of the object, such as the longitudinal symmetry axis of a pear. This parameter can be flexibly adjusted based on task requirements. However, the paper acknowledges a limitation: if the depth camera returns poor data for transparent, semi-transparent, or highly reflective objects, smoothing alone cannot rectify the situation. In such cases, it is necessary to combine other sensors or point cloud completion methods, which is a hardware constraint rather than an algorithmic issue.

4. Handling Cluttered Scenes: Direct Encoding of Task Constraints with Geometric Primitives

The orientation field is not limited to single objects. The paper illustrates a cluttered scene with other objects surrounding a banana, along with a bounding sphere and a plane representing a wall. The orientation field can simultaneously process point clouds, meshes, and geometric primitives (such as spheres, planes, and capsules). Notably, geometric primitives can directly encode task constraints. For instance, in a scooping task, a plane can constrain the tool to remain horizontal (to prevent spilling), while a line defines the lifting direction. These constraints do not require additional parameter adjustments; they are directly integrated into the orientation field’s calculations, naturally ensuring adherence to the constraints. The paper demonstrates a long-sequence task of “scoop-lift-carry-pour” using two bowls from the YCB dataset, incorporating lines and planes to impose task constraints. An unexpected finding was that cluttered scenes might actually facilitate faster computations, as the enclosed areas formed by multiple objects can be more computationally efficient than open spaces. The paper mentions that adding a bounding sphere reduced computational costs by approximately 1.5 times, related to the Walk on Spheres method, where random walks in enclosed areas converge more rapidly to boundaries.

The hardware setup utilized includes a six-degree-of-freedom uFactory Lite 6 robotic arm, Intel RealSense D415 depth camera, and force/torque sensors from Bota Systems, along with 3D-printed tools, peelers, and probe fixtures. They tested three tasks: slicing, peeling, and tactile coverage. Each task was defined as a “local action primitive,” consisting of simple action sequences within the local coordinate system. For instance, peeling is described as a cycle of “sliding along the surface, pressing down, and lifting up,” a description applicable to all objects. When transferring to new objects, recalculating the orientation field from real-time point clouds is all that’s needed, followed by using a compliant controller to track local actions. The paper illustrates successful transfers across six different objects, including bananas, cucumbers, pears, and cups.

5. Geometry-Driven vs. Data-Driven: Simplifying Task Transfer to Key Point Migration

The Neural Descriptor Field (NDF) method has gained popularity in recent years for learning local descriptors of objects using neural networks. However, their philosophies diverge: NDF is data-driven, possessing strong expressive power but requiring training data, while the orientation field is geometry-driven, encoding task inductive biases in key points and propagating them throughout the space via the diffusion process. The paper argues that since key points can be extracted using straightforward perception processes (like boundary detection), or through basic model transfers or manual annotations (as there are only a few points), why not utilize geometric methods directly? This simplifies task transfers across objects into the migration of key points. In computer graphics, a technique called Functional Maps allows for transferring functions between approximately isometric surfaces. This has been applied in grasp transfer scenarios. However, functional maps have two limitations: they can only handle open-loop positional trajectories and require trajectories to remain on the surface. The advantage of the orientation field is its ability to manage continuous interactions involving contact and separation without being restricted to surfaces—many tasks (like peeling or scooping) begin in the air before contacting surfaces. Performance data is provided in the supplementary materials. The most time-consuming step involves constructing the Laplacian operator from the point cloud, but this is a preprocessing step. During runtime, solving linear equations and executing Walk on Spheres sampling are both rapid. The paper’s code and data are available on Zenodo, and the GitHub repository link is included in the records. The authors transparently disclosing that they utilized ChatGPT for language refinement and Claude for organizing code documentation. The experimental results indicate that this approach is already functioning reliably in real-world scenarios. Tasks such as cucumber peeling, banana slicing, and surface detection of cups are accomplished in one attempt without repeated adjustments. This demonstrates that the orientation field effectively captures the essence of surface operations—not by memorizing each object’s specific shape, but by understanding the common geometric relationships of “along the surface” and “near the object” across different items.

6. Conclusion and Future Directions

The significance of this work extends beyond merely equipping robots with additional skills. It presents a novel approach: addressing generalization challenges using geometric structures rather than extensive data. In scenarios involving household service robots, agricultural harvesting, and medical assistance, the variety of objects makes it difficult to gather training data for each item. If task transfers can be accomplished with just a few key points, deployment costs will be significantly reduced. Of course, automatic extraction of key points requires further development, but the path forward is already clear. Funding for this research came from the Swiss National Science Foundation’s HORACE project, alongside the European Union’s Horizon Europe projects IntelliMan and SestoSenso. The project names reflect Europe’s substantial investment in robotic manipulation and an increasing emphasis on the role of geometric and physical constraints in learning. This paper serves as a milestone in this direction—demonstrating that purely geometric methods can effectively operate in the real world without relying on end-to-end learning.

Original article by NenPower, If reposted, please credit the source: https://nenpower.com/blog/swiss-team-develops-diffused-orientation-fields-to-enhance-robot-manipulation-of-curved-objects/