Related papers: GAPartManip: A Large-scale Part-centric Dataset for Material-Agnostic Articulated Object Manipulation

GAPartManip: A Large-scale Part-centric Dataset for Material-Agnostic Articulated Object Manipulation

URL: http://arxiv.org/abs/2411.18276v2
Date: Fri, 21 Mar 2025 07:52:16 GMT
Title: GAPartManip: A Large-scale Part-centric Dataset for Material-Agnostic Articulated Object Manipulation
Authors: Wenbo Cui, Chengyang Zhao, Songlin Wei, Jiazhao Zhang, Haoran Geng, Yaran Chen, Haoran Li, He Wang,
Abstract summary: This paper introduces a large-scale part-centric dataset for articulated object manipulation.<n>It features photo-realistic material randomization and detailed annotations of part-oriented, scene-level actionable interaction poses.<n>We propose a novel modular framework that delivers superior and robust performance for generalizable articulated object manipulation.
Score: 11.880519765681408
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Effectively manipulating articulated objects in household scenarios is a crucial step toward achieving general embodied artificial intelligence. Mainstream research in 3D vision has primarily focused on manipulation through depth perception and pose detection. However, in real-world environments, these methods often face challenges due to imperfect depth perception, such as with transparent lids and reflective handles. Moreover, they generally lack the diversity in part-based interactions required for flexible and adaptable manipulation. To address these challenges, we introduced a large-scale part-centric dataset for articulated object manipulation that features both photo-realistic material randomization and detailed annotations of part-oriented, scene-level actionable interaction poses. We evaluated the effectiveness of our dataset by integrating it with several state-of-the-art methods for depth estimation and interaction pose prediction. Additionally, we proposed a novel modular framework that delivers superior and robust performance for generalizable articulated object manipulation. Our extensive experiments demonstrate that our dataset significantly improves the performance of depth perception and actionable interaction pose prediction in both simulation and real-world scenarios. More information and demos can be found at: https://pku-epic.github.io/GAPartManip/.

Related papers

IAAO: Interactive Affordance Learning for Articulated Objects in 3D Environments [56.85804719947]
We present IAAO, a framework that builds an explicit 3D model for intelligent agents to gain understanding of articulated objects in their environment through interaction. We first build hierarchical features and label fields for each object state using 3D Gaussian Splatting (3DGS) by distilling mask features and view-consistent labels from multi-view images. We then perform object- and part-level queries on the 3D Gaussian primitives to identify static and articulated elements, estimating global transformations and local articulation parameters along with affordances.
arXiv Detail & Related papers (2025-04-09T12:36:48Z)
ArtGS: Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting [66.29782808719301]
Building articulated objects is a key challenge in computer vision. Existing methods often fail to effectively integrate information across different object states. We introduce ArtGS, a novel approach that leverages 3D Gaussians as a flexible and efficient representation.
arXiv Detail & Related papers (2025-02-26T10:25:32Z)
V-HOP: Visuo-Haptic 6D Object Pose Tracking [18.984396185797667]
Humans naturally integrate vision and haptics for robust object perception during manipulation. Prior object pose estimation research has attempted to combine visual and haptic/tactile feedback. We introduce a new visuo-haptic transformer-based object pose tracker that seamlessly integrates visual and haptic input.
arXiv Detail & Related papers (2025-02-24T18:59:50Z)
Monocular Depth Estimation and Segmentation for Transparent Object with Iterative Semantic and Geometric Fusion [9.391182087420926]
We propose a monocular framework, which is the first to excel in both segmentation and depth estimation of transparent objects.<n>Specifically, we devise a novel semantic and geometric fusion module, effectively integrating the multi-scale information between tasks.<n> Experiments on two challenging synthetic and real-world datasets demonstrate that our model surpasses state-of-the-art monocular, stereo, and multi-view methods by a large margin.
arXiv Detail & Related papers (2025-02-20T14:57:01Z)
ManipGPT: Is Affordance Segmentation by Large Vision Models Enough for Articulated Object Manipulation? [17.356760351203715]
This paper introduces ManipGPT, a framework designed to predict optimal interaction areas for articulated objects. We created a dataset of 9.9k simulated and real images to bridge the sim-to-real gap. We significantly improved part-level affordance segmentation, adapting the model's in-context segmentation capabilities to robot manipulation scenarios.
arXiv Detail & Related papers (2024-12-13T11:22:01Z)
AugInsert: Learning Robust Visual-Force Policies via Data Augmentation for Object Assembly Tasks [7.631503105866245]
This paper primarily focuses on learning robust visual-force policies in the context of high-precision object assembly tasks. We aim to learn contact-rich manipulation policies with multisensory inputs on limited expert data by expanding human demonstrations via online data augmentation.
arXiv Detail & Related papers (2024-10-19T04:19:52Z)
Articulated Object Manipulation using Online Axis Estimation with SAM2-Based Tracking [59.87033229815062]
Articulated object manipulation requires precise object interaction, where the object's axis must be carefully considered. Previous research employed interactive perception for manipulating articulated objects, but typically, open-loop approaches often suffer from overlooking the interaction dynamics. We present a closed-loop pipeline integrating interactive perception with online axis estimation from segmented 3D point clouds.
arXiv Detail & Related papers (2024-09-24T17:59:56Z)
Zero-Shot Object-Centric Representation Learning [72.43369950684057]
We study current object-centric methods through the lens of zero-shot generalization. We introduce a benchmark comprising eight different synthetic and real-world datasets. We find that training on diverse real-world images improves transferability to unseen scenarios.
arXiv Detail & Related papers (2024-08-17T10:37:07Z)
Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns. A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z)
Learning Manipulation by Predicting Interaction [85.57297574510507]
We propose a general pre-training pipeline that learns Manipulation by Predicting the Interaction. The experimental results demonstrate that MPI exhibits remarkable improvement by 10% to 64% compared with previous state-of-the-art in real-world robot platforms.
arXiv Detail & Related papers (2024-06-01T13:28:31Z)
InterTracker: Discovering and Tracking General Objects Interacting with Hands in the Wild [40.489171608114574]
Existing methods rely on frame-based detectors to locate interacting objects. We propose to leverage hand-object interaction to track interactive objects. Our proposed method outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2023-08-06T09:09:17Z)
MetaGraspNet: A Large-Scale Benchmark Dataset for Scene-Aware Ambidextrous Bin Picking via Physics-based Metaverse Synthesis [72.85526892440251]
We introduce MetaGraspNet, a large-scale photo-realistic bin picking dataset constructed via physics-based metaverse synthesis. The proposed dataset contains 217k RGBD images across 82 different article types, with full annotations for object detection, amodal perception, keypoint detection, manipulation order and ambidextrous grasp labels for a parallel-jaw and vacuum gripper. We also provide a real dataset consisting of over 2.3k fully annotated high-quality RGBD images, divided into 5 levels of difficulties and an unseen object set to evaluate different object and layout properties.
arXiv Detail & Related papers (2022-08-08T08:15:34Z)
Learn to Predict How Humans Manipulate Large-sized Objects from Interactive Motions [82.90906153293585]
We propose a graph neural network, HO-GCN, to fuse motion data and dynamic descriptors for the prediction task. We show the proposed network that consumes dynamic descriptors can achieve state-of-the-art prediction results and help the network better generalize to unseen objects.
arXiv Detail & Related papers (2022-06-25T09:55:39Z)
Object Manipulation via Visual Target Localization [64.05939029132394]
Training agents to manipulate objects, poses many challenges. We propose an approach that explores the environment in search for target objects, computes their 3D coordinates once they are located, and then continues to estimate their 3D locations even when the objects are not visible. Our evaluations show a massive 3x improvement in success rate over a model that has access to the same sensory suite.
arXiv Detail & Related papers (2022-03-15T17:59:01Z)
Sim2Real Object-Centric Keypoint Detection and Description [40.58367357980036]
Keypoint detection and description play a central role in computer vision. We propose the object-centric formulation, which requires further identifying which object each interest point belongs to. We develop a sim2real contrastive learning mechanism that can generalize the model trained in simulation to real-world applications.
arXiv Detail & Related papers (2022-02-01T15:00:20Z)
VAT-Mart: Learning Visual Action Trajectory Proposals for Manipulating 3D ARTiculated Objects [19.296344218177534]
The space of 3D articulated objects is exceptionally rich in their myriad semantic categories, diverse shape geometry, and complicated part functionality. Previous works mostly abstract kinematic structure with estimated joint parameters and part poses as the visual representations for manipulating 3D articulated objects. We propose object-centric actionable visual priors as a novel perception-interaction handshaking point that the perception system outputs more actionable guidance than kinematic structure estimation.
arXiv Detail & Related papers (2021-06-28T07:47:31Z)
Hindsight for Foresight: Unsupervised Structured Dynamics Models from Physical Interaction [24.72947291987545]
Key challenge for an agent learning to interact with the world is to reason about physical properties of objects. We propose a novel approach for modeling the dynamics of a robot's interactions directly from unlabeled 3D point clouds and images.
arXiv Detail & Related papers (2020-08-02T11:04:49Z)
Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and Objects for 3D Hand Pose Estimation under Hand-Object Interaction [137.28465645405655]
HANDS'19 is a challenge to evaluate the abilities of current 3D hand pose estimators (HPEs) to interpolate and extrapolate the poses of a training set. We show that the accuracy of state-of-the-art methods can drop, and that they fail mostly on poses absent from the training set.
arXiv Detail & Related papers (2020-03-30T19:28:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.