PartManip: Learning Cross-Category Generalizable Part Manipulation
Policy from Point Cloud Observations
- URL: http://arxiv.org/abs/2303.16958v1
- Date: Wed, 29 Mar 2023 18:29:30 GMT
- Title: PartManip: Learning Cross-Category Generalizable Part Manipulation
Policy from Point Cloud Observations
- Authors: Haoran Geng, Ziming Li, Yiran Geng, Jiayi Chen, Hao Dong, He Wang
- Abstract summary: We build the first large-scale, part-based cross-category object manipulation benchmark, PartManip.
We train a state-based expert with our proposed part-based canonicalization and part-aware rewards, and then distill the knowledge to a vision-based student.
For cross-category generalization, we introduce domain adversarial learning for domain-invariant feature extraction.
- Score: 12.552149411655355
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning a generalizable object manipulation policy is vital for an embodied
agent to work in complex real-world scenes. Parts, as the shared components in
different object categories, have the potential to increase the generalization
ability of the manipulation policy and achieve cross-category object
manipulation. In this work, we build the first large-scale, part-based
cross-category object manipulation benchmark, PartManip, which is composed of
11 object categories, 494 objects, and 1432 tasks in 6 task classes. Compared
to previous work, our benchmark is also more diverse and realistic, i.e.,
having more objects and using sparse-view point cloud as input without oracle
information like part segmentation. To tackle the difficulties of vision-based
policy learning, we first train a state-based expert with our proposed
part-based canonicalization and part-aware rewards, and then distill the
knowledge to a vision-based student. We also find an expressive backbone is
essential to overcome the large diversity of different objects. For
cross-category generalization, we introduce domain adversarial learning for
domain-invariant feature extraction. Extensive experiments in simulation show
that our learned policy can outperform other methods by a large margin,
especially on unseen object categories. We also demonstrate our method can
successfully manipulate novel objects in the real world.
Related papers
- Entity-Centric Reinforcement Learning for Object Manipulation from Pixels [22.104757862869526]
Reinforcement Learning (RL) offers a general approach to learn object manipulation.
In practice, domains with more than a few objects are difficult for RL agents due to the curse of dimensionality.
We propose a structured approach for visual RL that is suitable for representing multiple objects and their interaction.
arXiv Detail & Related papers (2024-04-01T16:25:08Z) - Self-Supervised Visual Representation Learning with Semantic Grouping [50.14703605659837]
We tackle the problem of learning visual representations from unlabeled scene-centric data.
We propose contrastive learning from data-driven semantic slots, namely SlotCon, for joint semantic grouping and representation learning.
arXiv Detail & Related papers (2022-05-30T17:50:59Z) - Self-Supervised Learning of Object Parts for Semantic Segmentation [7.99536002595393]
We argue that self-supervised learning of object parts is a solution to this issue.
Our method surpasses the state-of-the-art on three semantic segmentation benchmarks by 17%-3%.
arXiv Detail & Related papers (2022-04-27T17:55:17Z) - Learning Generalizable Dexterous Manipulation from Human Grasp
Affordance [11.060931225148936]
Dexterous manipulation with a multi-finger hand is one of the most challenging problems in robotics.
Recent progress in imitation learning has largely improved the sample efficiency compared to Reinforcement Learning.
We propose to learn dexterous manipulation using large-scale demonstrations with diverse 3D objects in a category.
arXiv Detail & Related papers (2022-04-05T16:26:22Z) - PartAfford: Part-level Affordance Discovery from 3D Objects [113.91774531972855]
We present a new task of part-level affordance discovery (PartAfford)
Given only the affordance labels per object, the machine is tasked to (i) decompose 3D shapes into parts and (ii) discover how each part corresponds to a certain affordance category.
We propose a novel learning framework for PartAfford, which discovers part-level representations by leveraging only the affordance set supervision and geometric primitive regularization.
arXiv Detail & Related papers (2022-02-28T02:58:36Z) - Unsupervised Part Discovery from Contrastive Reconstruction [90.88501867321573]
The goal of self-supervised visual representation learning is to learn strong, transferable image representations.
We propose an unsupervised approach to object part discovery and segmentation.
Our method yields semantic parts consistent across fine-grained but visually distinct categories.
arXiv Detail & Related papers (2021-11-11T17:59:42Z) - Generalization in Dexterous Manipulation via Geometry-Aware Multi-Task
Learning [108.08083976908195]
We show that policies learned by existing reinforcement learning algorithms can in fact be generalist.
We show that a single generalist policy can perform in-hand manipulation of over 100 geometrically-diverse real-world objects.
Interestingly, we find that multi-task learning with object point cloud representations not only generalizes better but even outperforms single-object specialist policies.
arXiv Detail & Related papers (2021-11-04T17:59:56Z) - Learning Open-World Object Proposals without Learning to Classify [110.30191531975804]
We propose a classification-free Object Localization Network (OLN) which estimates the objectness of each region purely by how well the location and shape of a region overlaps with any ground-truth object.
This simple strategy learns generalizable objectness and outperforms existing proposals on cross-category generalization.
arXiv Detail & Related papers (2021-08-15T14:36:02Z) - Act the Part: Learning Interaction Strategies for Articulated Object
Part Discovery [18.331607910407183]
We introduce Act the Part (AtP) to learn how to interact with articulated objects to discover and segment their pieces.
Our experiments show AtP learns efficient strategies for part discovery, can generalize to unseen categories, and is capable of conditional reasoning for the task.
arXiv Detail & Related papers (2021-05-03T17:48:29Z) - Learning visual policies for building 3D shape categories [130.7718618259183]
Previous work in this domain often assembles particular instances of objects from known sets of primitives.
We learn a visual policy to assemble other instances of the same category.
Our visual assembly policies are trained with no real images and reach up to 95% success rate when evaluated on a real robot.
arXiv Detail & Related papers (2020-04-15T17:29:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.