GAPartManip: A Large-scale Part-centric Dataset for Material-Agnostic Articulated Object Manipulation
- URL: http://arxiv.org/abs/2411.18276v1
- Date: Wed, 27 Nov 2024 12:11:23 GMT
- Title: GAPartManip: A Large-scale Part-centric Dataset for Material-Agnostic Articulated Object Manipulation
- Authors: Wenbo Cui, Chengyang Zhao, Songlin Wei, Jiazhao Zhang, Haoran Geng, Yaran Chen, He Wang,
- Abstract summary: We introduce a large-scale part-centric dataset for articulated object manipulation.
We integrate it with several state-of-the-art methods for depth estimation and interaction pose prediction.
Our experiments demonstrate that our dataset significantly improves the performance of depth perception and actionable interaction pose prediction.
- Score: 9.593020996636932
- License:
- Abstract: Effectively manipulating articulated objects in household scenarios is a crucial step toward achieving general embodied artificial intelligence. Mainstream research in 3D vision has primarily focused on manipulation through depth perception and pose detection. However, in real-world environments, these methods often face challenges due to imperfect depth perception, such as with transparent lids and reflective handles. Moreover, they generally lack the diversity in part-based interactions required for flexible and adaptable manipulation. To address these challenges, we introduced a large-scale part-centric dataset for articulated object manipulation that features both photo-realistic material randomizations and detailed annotations of part-oriented, scene-level actionable interaction poses. We evaluated the effectiveness of our dataset by integrating it with several state-of-the-art methods for depth estimation and interaction pose prediction. Additionally, we proposed a novel modular framework that delivers superior and robust performance for generalizable articulated object manipulation. Our extensive experiments demonstrate that our dataset significantly improves the performance of depth perception and actionable interaction pose prediction in both simulation and real-world scenarios.
Related papers
- EC-Diffuser: Multi-Object Manipulation via Entity-Centric Behavior Generation [25.12999060040265]
Learning to manipulate objects from high-dimensional observations presents significant challenges.
Recent approaches have utilized large-scale offline data to train models from pixel observations.
We propose a novel behavioral cloning (BC) approach that leverages object-centric representations and an entity-centric Transformer.
arXiv Detail & Related papers (2024-12-25T13:50:15Z) - Efficient Object-centric Representation Learning with Pre-trained Geometric Prior [1.9685736810241874]
We propose a weakly-supervised framework that emphasises geometric understanding and leverages pre-trained vision models to enhance object discovery.
Our method introduces an efficient slot decoder specifically designed for object-centric learning, enabling effective representation of multi-object scenes without requiring explicit depth information.
arXiv Detail & Related papers (2024-12-16T20:01:35Z) - ManipGPT: Is Affordance Segmentation by Large Vision Models Enough for Articulated Object Manipulation? [17.356760351203715]
This paper introduces ManipGPT, a framework designed to predict optimal interaction areas for articulated objects.
We created a dataset of 9.9k simulated and real images to bridge the sim-to-real gap.
We significantly improved part-level affordance segmentation, adapting the model's in-context segmentation capabilities to robot manipulation scenarios.
arXiv Detail & Related papers (2024-12-13T11:22:01Z) - AugInsert: Learning Robust Visual-Force Policies via Data Augmentation for Object Assembly Tasks [7.631503105866245]
This paper primarily focuses on learning robust visual-force policies in the context of high-precision object assembly tasks.
We aim to learn contact-rich manipulation policies with multisensory inputs on limited expert data by expanding human demonstrations via online data augmentation.
arXiv Detail & Related papers (2024-10-19T04:19:52Z) - Articulated Object Manipulation using Online Axis Estimation with SAM2-Based Tracking [59.87033229815062]
Articulated object manipulation requires precise object interaction, where the object's axis must be carefully considered.
Previous research employed interactive perception for manipulating articulated objects, but typically, open-loop approaches often suffer from overlooking the interaction dynamics.
We present a closed-loop pipeline integrating interactive perception with online axis estimation from segmented 3D point clouds.
arXiv Detail & Related papers (2024-09-24T17:59:56Z) - Zero-Shot Object-Centric Representation Learning [72.43369950684057]
We study current object-centric methods through the lens of zero-shot generalization.
We introduce a benchmark comprising eight different synthetic and real-world datasets.
We find that training on diverse real-world images improves transferability to unseen scenarios.
arXiv Detail & Related papers (2024-08-17T10:37:07Z) - Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - Learn to Predict How Humans Manipulate Large-sized Objects from
Interactive Motions [82.90906153293585]
We propose a graph neural network, HO-GCN, to fuse motion data and dynamic descriptors for the prediction task.
We show the proposed network that consumes dynamic descriptors can achieve state-of-the-art prediction results and help the network better generalize to unseen objects.
arXiv Detail & Related papers (2022-06-25T09:55:39Z) - Object Manipulation via Visual Target Localization [64.05939029132394]
Training agents to manipulate objects, poses many challenges.
We propose an approach that explores the environment in search for target objects, computes their 3D coordinates once they are located, and then continues to estimate their 3D locations even when the objects are not visible.
Our evaluations show a massive 3x improvement in success rate over a model that has access to the same sensory suite.
arXiv Detail & Related papers (2022-03-15T17:59:01Z) - VAT-Mart: Learning Visual Action Trajectory Proposals for Manipulating
3D ARTiculated Objects [19.296344218177534]
The space of 3D articulated objects is exceptionally rich in their myriad semantic categories, diverse shape geometry, and complicated part functionality.
Previous works mostly abstract kinematic structure with estimated joint parameters and part poses as the visual representations for manipulating 3D articulated objects.
We propose object-centric actionable visual priors as a novel perception-interaction handshaking point that the perception system outputs more actionable guidance than kinematic structure estimation.
arXiv Detail & Related papers (2021-06-28T07:47:31Z) - Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and
Objects for 3D Hand Pose Estimation under Hand-Object Interaction [137.28465645405655]
HANDS'19 is a challenge to evaluate the abilities of current 3D hand pose estimators (HPEs) to interpolate and extrapolate the poses of a training set.
We show that the accuracy of state-of-the-art methods can drop, and that they fail mostly on poses absent from the training set.
arXiv Detail & Related papers (2020-03-30T19:28:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.