HACMan: Learning Hybrid Actor-Critic Maps for 6D Non-Prehensile Manipulation
- URL: http://arxiv.org/abs/2305.03942v5
- Date: Mon, 15 Jul 2024 03:49:48 GMT
- Title: HACMan: Learning Hybrid Actor-Critic Maps for 6D Non-Prehensile Manipulation
- Authors: Wenxuan Zhou, Bowen Jiang, Fan Yang, Chris Paxton, David Held,
- Abstract summary: We introduce Hybrid Actor-Critic Maps for Manipulation (HACMan), a reinforcement learning approach for 6D non-prehensile manipulation of objects.
We evaluate HACMan on a 6D object pose alignment task in both simulation and in the real world.
Compared to alternative action representations, HACMan achieves a success rate more than three times higher than the best baseline.
- Score: 29.01984677695523
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Manipulating objects without grasping them is an essential component of human dexterity, referred to as non-prehensile manipulation. Non-prehensile manipulation may enable more complex interactions with the objects, but also presents challenges in reasoning about gripper-object interactions. In this work, we introduce Hybrid Actor-Critic Maps for Manipulation (HACMan), a reinforcement learning approach for 6D non-prehensile manipulation of objects using point cloud observations. HACMan proposes a temporally-abstracted and spatially-grounded object-centric action representation that consists of selecting a contact location from the object point cloud and a set of motion parameters describing how the robot will move after making contact. We modify an existing off-policy RL algorithm to learn in this hybrid discrete-continuous action representation. We evaluate HACMan on a 6D object pose alignment task in both simulation and in the real world. On the hardest version of our task, with randomized initial poses, randomized 6D goals, and diverse object categories, our policy demonstrates strong generalization to unseen object categories without a performance drop, achieving an 89% success rate on unseen objects in simulation and 50% success rate with zero-shot transfer in the real world. Compared to alternative action representations, HACMan achieves a success rate more than three times higher than the best baseline. With zero-shot sim2real transfer, our policy can successfully manipulate unseen objects in the real world for challenging non-planar goals, using dynamic and contact-rich non-prehensile skills. Videos can be found on the project website: https://hacman-2023.github.io.
Related papers
- PickScan: Object discovery and reconstruction from handheld interactions [99.99566882133179]
We develop an interaction-guided and class-agnostic method to reconstruct 3D representations of scenes.
Our main contribution is a novel approach to detecting user-object interactions and extracting the masks of manipulated objects.
Compared to Co-Fusion, the only comparable interaction-based and class-agnostic baseline, this corresponds to a reduction in chamfer distance of 73%.
arXiv Detail & Related papers (2024-11-17T23:09:08Z) - Articulated Object Manipulation using Online Axis Estimation with SAM2-Based Tracking [59.87033229815062]
Articulated object manipulation requires precise object interaction, where the object's axis must be carefully considered.
Previous research employed interactive perception for manipulating articulated objects, but typically, open-loop approaches often suffer from overlooking the interaction dynamics.
We present a closed-loop pipeline integrating interactive perception with online axis estimation from segmented 3D point clouds.
arXiv Detail & Related papers (2024-09-24T17:59:56Z) - Uncertainty-aware Active Learning of NeRF-based Object Models for Robot Manipulators using Visual and Re-orientation Actions [8.059133373836913]
This paper presents an approach that enables a robot to rapidly learn the complete 3D model of a given object for manipulation in unfamiliar orientations.
We use an ensemble of partially constructed NeRF models to quantify model uncertainty to determine the next action.
Our approach determines when and how to grasp and re-orient an object given its partial NeRF model and re-estimates the object pose to rectify misalignments introduced during the interaction.
arXiv Detail & Related papers (2024-04-02T10:15:06Z) - RPMArt: Towards Robust Perception and Manipulation for Articulated Objects [56.73978941406907]
We propose a framework towards Robust Perception and Manipulation for Articulated Objects ( RPMArt)
RPMArt learns to estimate the articulation parameters and manipulate the articulation part from the noisy point cloud.
We introduce an articulation-aware classification scheme to enhance its ability for sim-to-real transfer.
arXiv Detail & Related papers (2024-03-24T05:55:39Z) - ROAM: Robust and Object-Aware Motion Generation Using Neural Pose
Descriptors [73.26004792375556]
This paper shows that robustness and generalisation to novel scene objects in 3D object-aware character synthesis can be achieved by training a motion model with as few as one reference object.
We leverage an implicit feature representation trained on object-only datasets, which encodes an SE(3)-equivariant descriptor field around the object.
We demonstrate substantial improvements in 3D virtual character motion and interaction quality and robustness to scenarios with unseen objects.
arXiv Detail & Related papers (2023-08-24T17:59:51Z) - H-SAUR: Hypothesize, Simulate, Act, Update, and Repeat for Understanding
Object Articulations from Interactions [62.510951695174604]
"Hypothesize, Simulate, Act, Update, and Repeat" (H-SAUR) is a probabilistic generative framework that generates hypotheses about how objects articulate given input observations.
We show that the proposed model significantly outperforms the current state-of-the-art articulated object manipulation framework.
We further improve the test-time efficiency of H-SAUR by integrating a learned prior from learning-based vision models.
arXiv Detail & Related papers (2022-10-22T18:39:33Z) - Object Manipulation via Visual Target Localization [64.05939029132394]
Training agents to manipulate objects, poses many challenges.
We propose an approach that explores the environment in search for target objects, computes their 3D coordinates once they are located, and then continues to estimate their 3D locations even when the objects are not visible.
Our evaluations show a massive 3x improvement in success rate over a model that has access to the same sensory suite.
arXiv Detail & Related papers (2022-03-15T17:59:01Z) - Hindsight for Foresight: Unsupervised Structured Dynamics Models from
Physical Interaction [24.72947291987545]
Key challenge for an agent learning to interact with the world is to reason about physical properties of objects.
We propose a novel approach for modeling the dynamics of a robot's interactions directly from unlabeled 3D point clouds and images.
arXiv Detail & Related papers (2020-08-02T11:04:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.