Imagination Policy: Using Generative Point Cloud Models for Learning Manipulation Policies
- URL: http://arxiv.org/abs/2406.11740v2
- Date: Sat, 30 Nov 2024 17:40:26 GMT
- Title: Imagination Policy: Using Generative Point Cloud Models for Learning Manipulation Policies
- Authors: Haojie Huang, Karl Schmeckpeper, Dian Wang, Ondrej Biza, Yaoyao Qian, Haotian Liu, Mingxi Jia, Robert Platt, Robin Walters,
- Abstract summary: We propose Imagination Policy, a novel multi-task key-frame policy network for solving high-precision pick and place tasks.
Instead of learning actions directly, Imagination Policy generates point clouds to imagine desired states which are then translated to actions using rigid action estimation.
- Score: 25.760946763103483
- License:
- Abstract: Humans can imagine goal states during planning and perform actions to match those goals. In this work, we propose Imagination Policy, a novel multi-task key-frame policy network for solving high-precision pick and place tasks. Instead of learning actions directly, Imagination Policy generates point clouds to imagine desired states which are then translated to actions using rigid action estimation. This transforms action inference into a local generative task. We leverage pick and place symmetries underlying the tasks in the generation process and achieve extremely high sample efficiency and generalizability to unseen configurations. Finally, we demonstrate state-of-the-art performance across various tasks on the RLbench benchmark compared with several strong baselines and validate our approach on a real robot.
Related papers
- A Real-to-Sim-to-Real Approach to Robotic Manipulation with VLM-Generated Iterative Keypoint Rewards [29.923942622540356]
We introduce Iterative Keypoint Reward (IKER), a Python-based reward function that serves as a dynamic task specification.
We reconstruct real-world scenes in simulation and use the generated rewards to train reinforcement learning policies.
The results highlight IKER's effectiveness in enabling robots to perform multi-step tasks in dynamic environments.
arXiv Detail & Related papers (2025-02-12T18:57:22Z) - Keypoint Abstraction using Large Models for Object-Relative Imitation Learning [78.92043196054071]
Generalization to novel object configurations and instances across diverse tasks and environments is a critical challenge in robotics.
Keypoint-based representations have been proven effective as a succinct representation for essential object capturing features.
We propose KALM, a framework that leverages large pre-trained vision-language models to automatically generate task-relevant and cross-instance consistent keypoints.
arXiv Detail & Related papers (2024-10-30T17:37:31Z) - Flex: End-to-End Text-Instructed Visual Navigation with Foundation Models [59.892436892964376]
We investigate the minimal data requirements and architectural adaptations necessary to achieve robust closed-loop performance with vision-based control policies.
Our findings are synthesized in Flex (Fly-lexically), a framework that uses pre-trained Vision Language Models (VLMs) as frozen patch-wise feature extractors.
We demonstrate the effectiveness of this approach on quadrotor fly-to-target tasks, where agents trained via behavior cloning successfully generalize to real-world scenes.
arXiv Detail & Related papers (2024-10-16T19:59:31Z) - MATCH POLICY: A Simple Pipeline from Point Cloud Registration to Manipulation Policies [25.512068008948603]
MATCH POLICY is a pipeline for solving high-precision pick and place tasks.
It transfers action inference into a point cloud registration task.
It achieves extremely high sample efficiency and generalizability to unseen configurations.
arXiv Detail & Related papers (2024-09-23T20:09:43Z) - Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces.
We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories.
We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z) - Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in
Latent Space [76.46113138484947]
General-purpose robots require diverse repertoires of behaviors to complete challenging tasks in real-world unstructured environments.
To address this issue, goal-conditioned reinforcement learning aims to acquire policies that can reach goals for a wide range of tasks on command.
We propose Planning to Practice, a method that makes it practical to train goal-conditioned policies for long-horizon tasks.
arXiv Detail & Related papers (2022-05-17T06:58:17Z) - Learning Sensorimotor Primitives of Sequential Manipulation Tasks from
Visual Demonstrations [13.864448233719598]
This paper describes a new neural network-based framework for learning simultaneously low-level policies and high-level policies.
A key feature of the proposed approach is that the policies are learned directly from raw videos of task demonstrations.
Empirical results on object manipulation tasks with a robotic arm show that the proposed network can efficiently learn from real visual demonstrations to perform the tasks.
arXiv Detail & Related papers (2022-03-08T01:36:48Z) - Affordance Learning from Play for Sample-Efficient Policy Learning [30.701546777177555]
We use a self-supervised visual affordance model from human teleoperated play data to enable efficient policy learning and motion planning.
We combine model-based planning with model-free deep reinforcement learning to learn policies that favor the same object regions favored by people.
We find that our policies train 4x faster than the baselines and generalize better to novel objects because our visual affordance model can anticipate their affordance regions.
arXiv Detail & Related papers (2022-03-01T11:00:35Z) - Distilling a Hierarchical Policy for Planning and Control via
Representation and Reinforcement Learning [18.415568038071306]
We present a hierarchical planning and control framework that enables an agent to perform various tasks and adapt to a new task flexibly.
Rather than learning an individual policy for each task, the proposed framework, DISH, distills a hierarchical policy from a set of tasks by representation and reinforcement learning.
arXiv Detail & Related papers (2020-11-16T23:58:49Z) - Goal-Aware Prediction: Learning to Model What Matters [105.43098326577434]
One of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model and that of the downstream planner or policy.
We propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space.
We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.
arXiv Detail & Related papers (2020-07-14T16:42:59Z) - Goal-Conditioned End-to-End Visuomotor Control for Versatile Skill
Primitives [89.34229413345541]
We propose a conditioning scheme which avoids pitfalls by learning the controller and its conditioning in an end-to-end manner.
Our model predicts complex action sequences based directly on a dynamic image representation of the robot motion.
We report significant improvements in task success over representative MPC and IL baselines.
arXiv Detail & Related papers (2020-03-19T15:04:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.