Tube-NeRF: Efficient Imitation Learning of Visuomotor Policies from MPC
using Tube-Guided Data Augmentation and NeRFs
- URL: http://arxiv.org/abs/2311.14153v2
- Date: Mon, 26 Feb 2024 16:10:00 GMT
- Title: Tube-NeRF: Efficient Imitation Learning of Visuomotor Policies from MPC
using Tube-Guided Data Augmentation and NeRFs
- Authors: Andrea Tagliabue, Jonathan P. How
- Abstract summary: Imitation learning (IL) can train computationally-efficient sensorimotor policies from a resource-intensive Model Predictive Controller (MPC)
We propose a data augmentation (DA) strategy that enables efficient learning of vision-based policies.
We show 80-fold increase in demonstration efficiency and a 50% reduction in training time over current IL methods.
- Score: 42.220568722735095
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Imitation learning (IL) can train computationally-efficient sensorimotor
policies from a resource-intensive Model Predictive Controller (MPC), but it
often requires many samples, leading to long training times or limited
robustness. To address these issues, we combine IL with a variant of robust MPC
that accounts for process and sensing uncertainties, and we design a data
augmentation (DA) strategy that enables efficient learning of vision-based
policies. The proposed DA method, named Tube-NeRF, leverages Neural Radiance
Fields (NeRFs) to generate novel synthetic images, and uses properties of the
robust MPC (the tube) to select relevant views and to efficiently compute the
corresponding actions. We tailor our approach to the task of localization and
trajectory tracking on a multirotor, by learning a visuomotor policy that
generates control actions using images from the onboard camera as only source
of horizontal position. Numerical evaluations show 80-fold increase in
demonstration efficiency and a 50% reduction in training time over current IL
methods. Additionally, our policies successfully transfer to a real multirotor,
achieving low tracking errors despite large disturbances, with an onboard
inference time of only 1.5 ms.
Video: https://youtu.be/_W5z33ZK1m4
Related papers
- SHIRE: Enhancing Sample Efficiency using Human Intuition in REinforcement Learning [11.304750795377657]
We propose SHIRE, a framework for encoding human intuition using Probabilistic Graphical Models (PGMs)
SHIRE achieves 25-78% sample efficiency gains across the environments we evaluate at negligible overhead cost.
arXiv Detail & Related papers (2024-09-16T04:46:22Z) - Distilling Reinforcement Learning Policies for Interpretable Robot Locomotion: Gradient Boosting Machines and Symbolic Regression [53.33734159983431]
This paper introduces a novel approach to distill neural RL policies into more interpretable forms.
We train expert neural network policies using RL and distill them into (i) GBMs, (ii) EBMs, and (iii) symbolic policies.
arXiv Detail & Related papers (2024-03-21T11:54:45Z) - Output Feedback Tube MPC-Guided Data Augmentation for Robust, Efficient
Sensorimotor Policy Learning [49.05174527668836]
Imitation learning (IL) can generate computationally efficient sensorimotor policies from demonstrations provided by computationally expensive model-based sensing and control algorithms.
In this work, we combine IL with an output feedback robust tube model predictive controller to co-generate demonstrations and a data augmentation strategy to efficiently learn neural network-based sensorimotor policies.
We numerically demonstrate that our method can learn a robust visuomotor policy from a single demonstration--a two-orders of magnitude improvement in demonstration efficiency compared to existing IL methods.
arXiv Detail & Related papers (2022-10-18T19:59:17Z) - Robust, High-Rate Trajectory Tracking on Insect-Scale Soft-Actuated
Aerial Robots with Deep-Learned Tube MPC [0.0]
We present an approach for agile and computationally efficient trajectory tracking on the MIT SoftFly, a sub-gram MAV (0.7 grams)
Our strategy employs a cascaded control scheme, where an adaptive attitude controller is combined with a neural network policy trained to imitate a trajectory tracking robust tube model predictive controller (RTMPC)
We experimentally evaluate our approach, achieving position Root Mean Square Errors lower than 1.8 cm even in the more challenging maneuvers, obtaining a 60% reduction in maximum position error compared to our previous work, and robustness demonstrating to large external disturbances.
arXiv Detail & Related papers (2022-09-20T21:30:16Z) - Evaluating model-based planning and planner amortization for continuous
control [79.49319308600228]
We take a hybrid approach, combining model predictive control (MPC) with a learned model and model-free policy learning.
We find that well-tuned model-free agents are strong baselines even for high DoF control problems.
We show that it is possible to distil a model-based planner into a policy that amortizes the planning without any loss of performance.
arXiv Detail & Related papers (2021-10-07T12:00:40Z) - Demonstration-Efficient Guided Policy Search via Imitation of Robust
Tube MPC [36.3065978427856]
We propose a strategy to compress a computationally expensive Model Predictive Controller (MPC) into a more computationally efficient representation based on a deep neural network and Imitation Learning (IL)
By generating a Robust Tube variant (RTMPC) of the MPC and leveraging properties from the tube, we introduce a data augmentation method that enables high demonstration-efficiency.
Our method outperforms strategies commonly employed in IL, such as DAgger and Domain Randomization, in terms of demonstration-efficiency and robustness to perturbations unseen during training.
arXiv Detail & Related papers (2021-09-21T01:50:19Z) - A Framework for Efficient Robotic Manipulation [79.10407063260473]
We show that a single robotic arm can learn sparse-reward manipulation policies from pixels.
We show that, given only 10 demonstrations, a single robotic arm can learn sparse-reward manipulation policies from pixels.
arXiv Detail & Related papers (2020-12-14T22:18:39Z) - Hyperparameter Auto-tuning in Self-Supervised Robotic Learning [12.193817049957733]
Insufficient learning (due to convergence to local optima) results in under-performing policies whilst redundant learning wastes time and resources.
We propose an auto-tuning technique based on the Evidence Lower Bound (ELBO) for self-supervised reinforcement learning.
Our method can auto-tune online and yields the best performance at a fraction of the time and computational resources.
arXiv Detail & Related papers (2020-10-16T08:58:24Z) - AWAC: Accelerating Online Reinforcement Learning with Offline Datasets [84.94748183816547]
We show that our method, advantage weighted actor critic (AWAC), enables rapid learning of skills with a combination of prior demonstration data and online experience.
Our results show that incorporating prior data can reduce the time required to learn a range of robotic skills to practical time-scales.
arXiv Detail & Related papers (2020-06-16T17:54:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.