Related papers: Geometry-aware RL for Manipulation of Varying Shapes and Deformable Objects

Geometry-aware RL for Manipulation of Varying Shapes and Deformable Objects

URL: http://arxiv.org/abs/2502.07005v3
Date: Thu, 13 Feb 2025 12:11:58 GMT
Title: Geometry-aware RL for Manipulation of Varying Shapes and Deformable Objects
Authors: Tai Hoang, Huy Le, Philipp Becker, Vien Anh Ngo, Gerhard Neumann,
Abstract summary: Manipulating objects with varying geometries and deformable objects is a major challenge in robotics.<n>In this work, we frame this problem through the lens of a heterogeneous graph that comprises smaller sub-graphs.<n>We present a novel and challenging reinforcement learning benchmark, including rigid insertion of diverse objects.
Score: 14.481805160449282
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Manipulating objects with varying geometries and deformable objects is a major challenge in robotics. Tasks such as insertion with different objects or cloth hanging require precise control and effective modelling of complex dynamics. In this work, we frame this problem through the lens of a heterogeneous graph that comprises smaller sub-graphs, such as actuators and objects, accompanied by different edge types describing their interactions. This graph representation serves as a unified structure for both rigid and deformable objects tasks, and can be extended further to tasks comprising multiple actuators. To evaluate this setup, we present a novel and challenging reinforcement learning benchmark, including rigid insertion of diverse objects, as well as rope and cloth manipulation with multiple end-effectors. These tasks present a large search space, as both the initial and target configurations are uniformly sampled in 3D space. To address this issue, we propose a novel graph-based policy model, dubbed Heterogeneous Equivariant Policy (HEPi), utilizing $SE(3)$ equivariant message passing networks as the main backbone to exploit the geometric symmetry. In addition, by modeling explicit heterogeneity, HEPi can outperform Transformer-based and non-heterogeneous equivariant policies in terms of average returns, sample efficiency, and generalization to unseen objects.

Related papers

IAAO: Interactive Affordance Learning for Articulated Objects in 3D Environments [56.85804719947]
We present IAAO, a framework that builds an explicit 3D model for intelligent agents to gain understanding of articulated objects in their environment through interaction. We first build hierarchical features and label fields for each object state using 3D Gaussian Splatting (3DGS) by distilling mask features and view-consistent labels from multi-view images. We then perform object- and part-level queries on the 3D Gaussian primitives to identify static and articulated elements, estimating global transformations and local articulation parameters along with affordances.
arXiv Detail & Related papers (2025-04-09T12:36:48Z)
SIGHT: Synthesizing Image-Text Conditioned and Geometry-Guided 3D Hand-Object Trajectories [124.24041272390954]
Modeling hand-object interaction priors holds significant potential to advance robotic and embodied AI systems.<n>We introduce SIGHT, a novel task focused on generating realistic and physically plausible 3D hand-object interaction trajectories from a single image.<n>We propose SIGHT-Fusion, a novel diffusion-based image-text conditioned generative model that tackles this task by retrieving the most similar 3D object mesh from a database.
arXiv Detail & Related papers (2025-03-28T20:53:20Z)
ArtFormer: Controllable Generation of Diverse 3D Articulated Objects [5.320860732053524]
This paper presents a novel framework for modeling and conditional generation of 3D articulated objects.<n>We parameterize an articulated object as a tree of tokens and employ a transformer to generate both the object's high-level geometry code and its kinematic relations.<n>Our approach enables the generation of diverse objects with high-quality geometry and varying number of parts.
arXiv Detail & Related papers (2024-12-10T07:00:05Z)
GREAT: Geometry-Intention Collaborative Inference for Open-Vocabulary 3D Object Affordance Grounding [53.42728468191711]
Open-Vocabulary 3D object affordance grounding aims to anticipate action possibilities'' regions on 3D objects with arbitrary instructions.<n>We propose GREAT (GeometRy-intEntion collAboraTive inference) for Open-Vocabulary 3D Object Affordance Grounding.
arXiv Detail & Related papers (2024-11-29T11:23:15Z)
Deep SE(3)-Equivariant Geometric Reasoning for Precise Placement Tasks [24.721590424515043]
We propose a method for precise relative pose prediction which is provably SE(3)-equivariant. We accomplish this by factoring the problem into learning an SE(3) invariant task-specific representation of the scene. We demonstrate that our method can yield substantially more precise placement predictions in simulated placement tasks.
arXiv Detail & Related papers (2024-04-20T22:16:56Z)
GAMMA: Generalizable Articulation Modeling and Manipulation for Articulated Objects [53.965581080954905]
We propose a novel framework of Generalizable Articulation Modeling and Manipulating for Articulated Objects (GAMMA) GAMMA learns both articulation modeling and grasp pose affordance from diverse articulated objects with different categories. Results show that GAMMA significantly outperforms SOTA articulation modeling and manipulation algorithms in unseen and cross-category articulated objects.
arXiv Detail & Related papers (2023-09-28T08:57:14Z)
Explicit3D: Graph Network with Spatial Inference for Single Image 3D Object Detection [35.85544715234846]
We propose a dynamic sparse graph pipeline named Explicit3D based on object geometry and semantics features. Our experimental results on the SUN RGB-D dataset demonstrate that our Explicit3D achieves better performance balance than the-state-of-the-art.
arXiv Detail & Related papers (2023-02-13T16:19:54Z)
A Light Touch Approach to Teaching Transformers Multi-view Geometry [80.35521056416242]
We propose a "light touch" approach to guiding visual Transformers to learn multiple-view geometry. We achieve this by using epipolar lines to guide the Transformer's cross-attention maps. Unlike previous methods, our proposal does not require any camera pose information at test-time.
arXiv Detail & Related papers (2022-11-28T07:54:06Z)
Efficient Representations of Object Geometry for Reinforcement Learning of Interactive Grasping Policies [29.998917158604694]
We present a reinforcement learning framework that learns the interactive grasping of various geometrically distinct real-world objects. Videos of learned interactive policies are available at https://maltemosbach.org/io/geometry_aware_grasping_policies.
arXiv Detail & Related papers (2022-11-20T11:47:33Z)
Generative Category-Level Shape and Pose Estimation with Semantic Primitives [27.692997522812615]
We propose a novel framework for category-level object shape and pose estimation from a single RGB-D image. To handle the intra-category variation, we adopt a semantic primitive representation that encodes diverse shapes into a unified latent space. We show that the proposed method achieves SOTA pose estimation performance and better generalization in the real-world dataset.
arXiv Detail & Related papers (2022-10-03T17:51:54Z)
Geometry-Contrastive Transformer for Generalized 3D Pose Transfer [95.56457218144983]
The intuition of this work is to perceive the geometric inconsistency between the given meshes with the powerful self-attention mechanism. We propose a novel geometry-contrastive Transformer that has an efficient 3D structured perceiving ability to the global geometric inconsistencies. We present a latent isometric regularization module together with a novel semi-synthesized dataset for the cross-dataset 3D pose transfer task.
arXiv Detail & Related papers (2021-12-14T13:14:24Z)
Nothing But Geometric Constraints: A Model-Free Method for Articulated Object Pose Estimation [89.82169646672872]
We propose an unsupervised vision-based system to estimate the joint configurations of the robot arm from a sequence of RGB or RGB-D images without knowing the model a priori. We combine a classical geometric formulation with deep learning and extend the use of epipolar multi-rigid-body constraints to solve this task.
arXiv Detail & Related papers (2020-11-30T20:46:48Z)
Continuous Surface Embeddings [76.86259029442624]
We focus on the task of learning and representing dense correspondences in deformable object categories. We propose a new, learnable image-based representation of dense correspondences. We demonstrate that the proposed approach performs on par or better than the state-of-the-art methods for dense pose estimation for humans.
arXiv Detail & Related papers (2020-11-24T22:52:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.