Learning Generalizable Manipulation Policies with Object-Centric 3D
Representations
- URL: http://arxiv.org/abs/2310.14386v1
- Date: Sun, 22 Oct 2023 18:51:45 GMT
- Title: Learning Generalizable Manipulation Policies with Object-Centric 3D
Representations
- Authors: Yifeng Zhu, Zhenyu Jiang, Peter Stone, Yuke Zhu
- Abstract summary: GROOT is an imitation learning method for learning robust policies with object-centric and 3D priors.
It builds policies that generalize beyond their initial training conditions for vision-based manipulation.
GROOT's performance excels in generalization over background changes, camera viewpoint shifts, and the presence of new object instances.
- Score: 65.55352131167213
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce GROOT, an imitation learning method for learning robust policies
with object-centric and 3D priors. GROOT builds policies that generalize beyond
their initial training conditions for vision-based manipulation. It constructs
object-centric 3D representations that are robust toward background changes and
camera views and reason over these representations using a transformer-based
policy. Furthermore, we introduce a segmentation correspondence model that
allows policies to generalize to new objects at test time. Through
comprehensive experiments, we validate the robustness of GROOT policies against
perceptual variations in simulated and real-world environments. GROOT's
performance excels in generalization over background changes, camera viewpoint
shifts, and the presence of new object instances, whereas both state-of-the-art
end-to-end learning methods and object proposal-based approaches fall short. We
also extensively evaluate GROOT policies on real robots, where we demonstrate
the efficacy under very wild changes in setup. More videos and model details
can be found in the appendix and the project website:
https://ut-austin-rpl.github.io/GROOT .
Related papers
- HACMan++: Spatially-Grounded Motion Primitives for Manipulation [28.411361363637006]
We introduce spatially-grounded parameterized motion primitives in our method HACMan++.
By grounding the primitives on a spatial location in the environment, our method is able to effectively generalize across object shape and pose variations.
Our approach significantly outperforms existing methods, particularly in complex scenarios demanding both high-level sequential reasoning and object generalization.
arXiv Detail & Related papers (2024-07-11T15:10:14Z) - Learning Manipulation by Predicting Interaction [85.57297574510507]
We propose a general pre-training pipeline that learns Manipulation by Predicting the Interaction.
The experimental results demonstrate that MPI exhibits remarkable improvement by 10% to 64% compared with previous state-of-the-art in real-world robot platforms.
arXiv Detail & Related papers (2024-06-01T13:28:31Z) - ObjectCompose: Evaluating Resilience of Vision-Based Models on Object-to-Background Compositional Changes [64.57705752579207]
We evaluate the resilience of vision-based models against diverse object-to-background context variations.
We harness the generative capabilities of text-to-image, image-to-text, and image-to-segment models to automatically generate object-to-background changes.
arXiv Detail & Related papers (2024-03-07T17:48:48Z) - Graphical Object-Centric Actor-Critic [55.2480439325792]
We propose a novel object-centric reinforcement learning algorithm combining actor-critic and model-based approaches.
We use a transformer encoder to extract object representations and graph neural networks to approximate the dynamics of an environment.
Our algorithm performs better in a visually complex 3D robotic environment and a 2D environment with compositional structure than the state-of-the-art model-free actor-critic algorithm.
arXiv Detail & Related papers (2023-10-26T06:05:12Z) - Localizing Active Objects from Egocentric Vision with Symbolic World
Knowledge [62.981429762309226]
The ability to actively ground task instructions from an egocentric view is crucial for AI agents to accomplish tasks or assist humans virtually.
We propose to improve phrase grounding models' ability on localizing the active objects by: learning the role of objects undergoing change and extracting them accurately from the instructions.
We evaluate our framework on Ego4D and Epic-Kitchens datasets.
arXiv Detail & Related papers (2023-10-23T16:14:05Z) - Transferring Foundation Models for Generalizable Robotic Manipulation [82.12754319808197]
We propose a novel paradigm that effectively leverages language-reasoning segmentation mask generated by internet-scale foundation models.
Our approach can effectively and robustly perceive object pose and enable sample-efficient generalization learning.
Demos can be found in our submitted video, and more comprehensive ones can be found in link1 or link2.
arXiv Detail & Related papers (2023-06-09T07:22:12Z) - Efficient Representations of Object Geometry for Reinforcement Learning
of Interactive Grasping Policies [29.998917158604694]
We present a reinforcement learning framework that learns the interactive grasping of various geometrically distinct real-world objects.
Videos of learned interactive policies are available at https://maltemosbach.org/io/geometry_aware_grasping_policies.
arXiv Detail & Related papers (2022-11-20T11:47:33Z) - A System for General In-Hand Object Re-Orientation [23.538271727475525]
We present a model-free framework that can learn to reorient objects with both the hand facing upwards and downwards.
We demonstrate the capability of reorienting over 2000 geometrically different objects in both cases.
arXiv Detail & Related papers (2021-11-04T17:47:39Z) - SoftGym: Benchmarking Deep Reinforcement Learning for Deformable Object
Manipulation [15.477950393687836]
We present SoftGym, a set of open-source simulated benchmarks for manipulating deformable objects.
We evaluate a variety of algorithms on these tasks and highlight challenges for reinforcement learning algorithms.
arXiv Detail & Related papers (2020-11-14T03:46:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.