Distilling Motion Planner Augmented Policies into Visual Control
Policies for Robot Manipulation
- URL: http://arxiv.org/abs/2111.06383v1
- Date: Thu, 11 Nov 2021 18:52:00 GMT
- Title: Distilling Motion Planner Augmented Policies into Visual Control
Policies for Robot Manipulation
- Authors: I-Chun Arthur Liu and Shagun Uppal and Gaurav S. Sukhatme and Joseph
J. Lim and Peter Englert and Youngwoon Lee
- Abstract summary: We propose to distill a state-based motion planner augmented policy to a visual control policy.
We evaluate our method on three manipulation tasks in obstructed environments.
Our framework is highly sample-efficient and outperforms the state-of-the-art algorithms.
- Score: 26.47544415550067
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning complex manipulation tasks in realistic, obstructed environments is
a challenging problem due to hard exploration in the presence of obstacles and
high-dimensional visual observations. Prior work tackles the exploration
problem by integrating motion planning and reinforcement learning. However, the
motion planner augmented policy requires access to state information, which is
often not available in the real-world settings. To this end, we propose to
distill a state-based motion planner augmented policy to a visual control
policy via (1) visual behavioral cloning to remove the motion planner
dependency along with its jittery motion, and (2) vision-based reinforcement
learning with the guidance of the smoothed trajectories from the behavioral
cloning agent. We evaluate our method on three manipulation tasks in obstructed
environments and compare it against various reinforcement learning and
imitation learning baselines. The results demonstrate that our framework is
highly sample-efficient and outperforms the state-of-the-art algorithms.
Moreover, coupled with domain randomization, our policy is capable of zero-shot
transfer to unseen environment settings with distractors. Code and videos are
available at https://clvrai.com/mopa-pd
Related papers
- RoboKoop: Efficient Control Conditioned Representations from Visual Input in Robotics using Koopman Operator [14.77553682217217]
We introduce a Contrastive Spectral Koopman Embedding network that allows us to learn efficient linearized visual representations from the agent's visual data in a high dimensional latent space.
Our method enhances stability and control in gradient dynamics over time, significantly outperforming existing approaches.
arXiv Detail & Related papers (2024-09-04T22:14:59Z) - Nonprehensile Planar Manipulation through Reinforcement Learning with
Multimodal Categorical Exploration [8.343657309038285]
Reinforcement Learning is a powerful framework for developing such robot controllers.
We propose a multimodal exploration approach through categorical distributions, which enables us to train planar pushing RL policies.
We show that the learned policies are robust to external disturbances and observation noise, and scale to tasks with multiple pushers.
arXiv Detail & Related papers (2023-08-04T16:55:00Z) - Latent Exploration for Reinforcement Learning [87.42776741119653]
In Reinforcement Learning, agents learn policies by exploring and interacting with the environment.
We propose LATent TIme-Correlated Exploration (Lattice), a method to inject temporally-correlated noise into the latent state of the policy network.
arXiv Detail & Related papers (2023-05-31T17:40:43Z) - Learning Deep Sensorimotor Policies for Vision-based Autonomous Drone
Racing [52.50284630866713]
Existing systems often require hand-engineered components for state estimation, planning, and control.
This paper tackles the vision-based autonomous-drone-racing problem by learning deep sensorimotor policies.
arXiv Detail & Related papers (2022-10-26T19:03:17Z) - DeXtreme: Transfer of Agile In-hand Manipulation from Simulation to
Reality [64.51295032956118]
We train a policy that can perform robust dexterous manipulation on an anthropomorphic robot hand.
Our work reaffirms the possibilities of sim-to-real transfer for dexterous manipulation in diverse kinds of hardware and simulator setups.
arXiv Detail & Related papers (2022-10-25T01:51:36Z) - Action-Conditioned Contrastive Policy Pretraining [39.13710045468429]
Deep visuomotor policy learning achieves promising results in control tasks such as robotic manipulation and autonomous driving.
It requires a huge number of online interactions with the training environment, which limits its real-world application.
In this work, we aim to pretrain policy representations for driving tasks using hours-long uncurated YouTube videos.
arXiv Detail & Related papers (2022-04-05T17:58:22Z) - Learning Pneumatic Non-Prehensile Manipulation with a Mobile Blower [30.032847855193864]
blowing controller must continually adapt to unexpected changes from its actions.
We introduce a multi-frequency version of the spatial action maps framework.
This allows for efficient learning of vision-based policies that effectively combine high-level planning and low-level closed-loop control.
arXiv Detail & Related papers (2022-04-05T17:55:58Z) - Learning Robust Policy against Disturbance in Transition Dynamics via
State-Conservative Policy Optimization [63.75188254377202]
Deep reinforcement learning algorithms can perform poorly in real-world tasks due to discrepancy between source and target environments.
We propose a novel model-free actor-critic algorithm to learn robust policies without modeling the disturbance in advance.
Experiments in several robot control tasks demonstrate that SCPO learns robust policies against the disturbance in transition dynamics.
arXiv Detail & Related papers (2021-12-20T13:13:05Z) - Human-in-the-Loop Imitation Learning using Remote Teleoperation [72.2847988686463]
We build a data collection system tailored to 6-DoF manipulation settings.
We develop an algorithm to train the policy iteratively on new data collected by the system.
We demonstrate that agents trained on data collected by our intervention-based system and algorithm outperform agents trained on an equivalent number of samples collected by non-interventional demonstrators.
arXiv Detail & Related papers (2020-12-12T05:30:35Z) - Neural Dynamic Policies for End-to-End Sensorimotor Learning [51.24542903398335]
The current dominant paradigm in sensorimotor control, whether imitation or reinforcement learning, is to train policies directly in raw action spaces.
We propose Neural Dynamic Policies (NDPs) that make predictions in trajectory distribution space.
NDPs outperform the prior state-of-the-art in terms of either efficiency or performance across several robotic control tasks.
arXiv Detail & Related papers (2020-12-04T18:59:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.