Related papers: Learning Generalizable Manipulation Policies with Object-Centric 3D Representations

Learning Generalizable Manipulation Policies with Object-Centric 3D Representations

URL: http://arxiv.org/abs/2310.14386v1
Date: Sun, 22 Oct 2023 18:51:45 GMT
Title: Learning Generalizable Manipulation Policies with Object-Centric 3D Representations
Authors: Yifeng Zhu, Zhenyu Jiang, Peter Stone, Yuke Zhu
Abstract summary: GROOT is an imitation learning method for learning robust policies with object-centric and 3D priors. It builds policies that generalize beyond their initial training conditions for vision-based manipulation. GROOT's performance excels in generalization over background changes, camera viewpoint shifts, and the presence of new object instances.
Score: 65.55352131167213
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce GROOT, an imitation learning method for learning robust policies with object-centric and 3D priors. GROOT builds policies that generalize beyond their initial training conditions for vision-based manipulation. It constructs object-centric 3D representations that are robust toward background changes and camera views and reason over these representations using a transformer-based policy. Furthermore, we introduce a segmentation correspondence model that allows policies to generalize to new objects at test time. Through comprehensive experiments, we validate the robustness of GROOT policies against perceptual variations in simulated and real-world environments. GROOT's performance excels in generalization over background changes, camera viewpoint shifts, and the presence of new object instances, whereas both state-of-the-art end-to-end learning methods and object proposal-based approaches fall short. We also extensively evaluate GROOT policies on real robots, where we demonstrate the efficacy under very wild changes in setup. More videos and model details can be found in the appendix and the project website: https://ut-austin-rpl.github.io/GROOT .

Related papers

IAAO: Interactive Affordance Learning for Articulated Objects in 3D Environments [56.85804719947]
We present IAAO, a framework that builds an explicit 3D model for intelligent agents to gain understanding of articulated objects in their environment through interaction. We first build hierarchical features and label fields for each object state using 3D Gaussian Splatting (3DGS) by distilling mask features and view-consistent labels from multi-view images. We then perform object- and part-level queries on the 3D Gaussian primitives to identify static and articulated elements, estimating global transformations and local articulation parameters along with affordances.
arXiv Detail & Related papers (2025-04-09T12:36:48Z)
ArticuBot: Learning Universal Articulated Object Manipulation Policy via Large Scale Simulation [22.43711565969091]
Articubot is a system that learns a policy to open diverse categories of unseen articulated objects in the real world. We show that our learned policy can zero-shot transfer to three different real robot settings.
arXiv Detail & Related papers (2025-03-04T22:51:50Z)
P3-PO: Prescriptive Point Priors for Visuo-Spatial Generalization of Robot Policies [19.12762500264209]
Prescriptive Point Priors for Policies or P3-PO is a novel framework that constructs a unique state representation of the environment. P3-PO exhibits 58% and 80% gains across tasks for new object instances and more cluttered environments respectively.
arXiv Detail & Related papers (2024-12-09T18:59:42Z)
Keypoint Abstraction using Large Models for Object-Relative Imitation Learning [78.92043196054071]
Generalization to novel object configurations and instances across diverse tasks and environments is a critical challenge in robotics. Keypoint-based representations have been proven effective as a succinct representation for essential object capturing features. We propose KALM, a framework that leverages large pre-trained vision-language models to automatically generate task-relevant and cross-instance consistent keypoints.
arXiv Detail & Related papers (2024-10-30T17:37:31Z)
Hand-Object Interaction Pretraining from Videos [77.92637809322231]
We learn general robot manipulation priors from 3D hand-object interaction trajectories. We do so by sharing both the human hand and the manipulated object in 3D space and human motions to robot actions. We empirically demonstrate that finetuning this policy, with both reinforcement learning (RL) and behavior cloning (BC), enables sample-efficient adaptation to downstream tasks and simultaneously improves robustness and generalizability compared to prior approaches.
arXiv Detail & Related papers (2024-09-12T17:59:07Z)
View-Invariant Policy Learning via Zero-Shot Novel View Synthesis [26.231630397802785]
We investigate how knowledge from large-scale visual data of the world may be used to address one axis of variation for generalizable manipulation: observational viewpoint. We study single-image novel view synthesis models, which learn 3D-aware scene-level priors by rendering images of the same scene from alternate camera viewpoints. For practical application to diverse robotic data, these models must operate zero-shot, performing view synthesis on unseen tasks and environments.
arXiv Detail & Related papers (2024-09-05T16:39:21Z)
HACMan++: Spatially-Grounded Motion Primitives for Manipulation [28.411361363637006]
We introduce spatially-grounded parameterized motion primitives in our method HACMan++. By grounding the primitives on a spatial location in the environment, our method is able to effectively generalize across object shape and pose variations. Our approach significantly outperforms existing methods, particularly in complex scenarios demanding both high-level sequential reasoning and object generalization.
arXiv Detail & Related papers (2024-07-11T15:10:14Z)
ObjectCompose: Evaluating Resilience of Vision-Based Models on Object-to-Background Compositional Changes [64.57705752579207]
We evaluate the resilience of vision-based models against diverse object-to-background context variations. We harness the generative capabilities of text-to-image, image-to-text, and image-to-segment models to automatically generate object-to-background changes.
arXiv Detail & Related papers (2024-03-07T17:48:48Z)
Graphical Object-Centric Actor-Critic [55.2480439325792]
We propose a novel object-centric reinforcement learning algorithm combining actor-critic and model-based approaches. We use a transformer encoder to extract object representations and graph neural networks to approximate the dynamics of an environment. Our algorithm performs better in a visually complex 3D robotic environment and a 2D environment with compositional structure than the state-of-the-art model-free actor-critic algorithm.
arXiv Detail & Related papers (2023-10-26T06:05:12Z)
Transferring Foundation Models for Generalizable Robotic Manipulation [82.12754319808197]
We propose a novel paradigm that effectively leverages language-reasoning segmentation mask generated by internet-scale foundation models. Our approach can effectively and robustly perceive object pose and enable sample-efficient generalization learning. Demos can be found in our submitted video, and more comprehensive ones can be found in link1 or link2.
arXiv Detail & Related papers (2023-06-09T07:22:12Z)
Efficient Representations of Object Geometry for Reinforcement Learning of Interactive Grasping Policies [29.998917158604694]
We present a reinforcement learning framework that learns the interactive grasping of various geometrically distinct real-world objects. Videos of learned interactive policies are available at https://maltemosbach.org/io/geometry_aware_grasping_policies.
arXiv Detail & Related papers (2022-11-20T11:47:33Z)
SoftGym: Benchmarking Deep Reinforcement Learning for Deformable Object Manipulation [15.477950393687836]
We present SoftGym, a set of open-source simulated benchmarks for manipulating deformable objects. We evaluate a variety of algorithms on these tasks and highlight challenges for reinforcement learning algorithms.
arXiv Detail & Related papers (2020-11-14T03:46:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.