Demonstration-Efficient Guided Policy Search via Imitation of Robust
Tube MPC
- URL: http://arxiv.org/abs/2109.09910v1
- Date: Tue, 21 Sep 2021 01:50:19 GMT
- Title: Demonstration-Efficient Guided Policy Search via Imitation of Robust
Tube MPC
- Authors: Andrea Tagliabue, Dong-Ki Kim, Michael Everett, Jonathan P. How
- Abstract summary: We propose a strategy to compress a computationally expensive Model Predictive Controller (MPC) into a more computationally efficient representation based on a deep neural network and Imitation Learning (IL)
By generating a Robust Tube variant (RTMPC) of the MPC and leveraging properties from the tube, we introduce a data augmentation method that enables high demonstration-efficiency.
Our method outperforms strategies commonly employed in IL, such as DAgger and Domain Randomization, in terms of demonstration-efficiency and robustness to perturbations unseen during training.
- Score: 36.3065978427856
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a demonstration-efficient strategy to compress a computationally
expensive Model Predictive Controller (MPC) into a more computationally
efficient representation based on a deep neural network and Imitation Learning
(IL). By generating a Robust Tube variant (RTMPC) of the MPC and leveraging
properties from the tube, we introduce a data augmentation method that enables
high demonstration-efficiency, being capable to compensate the distribution
shifts typically encountered in IL. Our approach opens the possibility of
zero-shot transfer from a single demonstration collected in a nominal domain,
such as a simulation or a robot in a lab/controlled environment, to a domain
with bounded model errors/perturbations. Numerical and experimental evaluations
performed on a trajectory tracking MPC for a quadrotor show that our method
outperforms strategies commonly employed in IL, such as DAgger and Domain
Randomization, in terms of demonstration-efficiency and robustness to
perturbations unseen during training.
Related papers
- Tube-NeRF: Efficient Imitation Learning of Visuomotor Policies from MPC
using Tube-Guided Data Augmentation and NeRFs [42.220568722735095]
Imitation learning (IL) can train computationally-efficient sensorimotor policies from a resource-intensive Model Predictive Controller (MPC)
We propose a data augmentation (DA) strategy that enables efficient learning of vision-based policies.
We show 80-fold increase in demonstration efficiency and a 50% reduction in training time over current IL methods.
arXiv Detail & Related papers (2023-11-23T18:54:25Z) - Unsupervised Discovery of Interpretable Directions in h-space of
Pre-trained Diffusion Models [63.1637853118899]
We propose the first unsupervised and learning-based method to identify interpretable directions in h-space of pre-trained diffusion models.
We employ a shift control module that works on h-space of pre-trained diffusion models to manipulate a sample into a shifted version of itself.
By jointly optimizing them, the model will spontaneously discover disentangled and interpretable directions.
arXiv Detail & Related papers (2023-10-15T18:44:30Z) - Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level
Stability and High-Level Behavior [51.60683890503293]
We propose a theoretical framework for studying behavior cloning of complex expert demonstrations using generative modeling.
We show that pure supervised cloning can generate trajectories matching the per-time step distribution of arbitrary expert trajectories.
arXiv Detail & Related papers (2023-07-27T04:27:26Z) - Value function estimation using conditional diffusion models for control [62.27184818047923]
We propose a simple algorithm called Diffused Value Function (DVF)
It learns a joint multi-step model of the environment-robot interaction dynamics using a diffusion model.
We show how DVF can be used to efficiently capture the state visitation measure for multiple controllers.
arXiv Detail & Related papers (2023-06-09T18:40:55Z) - GAN-MPC: Training Model Predictive Controllers with Parameterized Cost
Functions using Demonstrations from Non-identical Experts [14.291720751625585]
We propose a generative adversarial network (GAN) to minimize the Jensen-Shannon divergence between the state-trajectory distributions of the demonstrator and the imitator.
We evaluate our approach on a variety of simulated robotics tasks of DeepMind Control suite.
arXiv Detail & Related papers (2023-05-30T15:15:30Z) - Learning Sampling Distributions for Model Predictive Control [36.82905770866734]
Sampling-based approaches to Model Predictive Control (MPC) have become a cornerstone of contemporary approaches to MPC.
We propose to carry out all operations in the latent space, allowing us to take full advantage of the learned distribution.
Specifically, we frame the learning problem as bi-level optimization and show how to train the controller with backpropagation-through-time.
arXiv Detail & Related papers (2022-12-05T20:35:36Z) - Output Feedback Tube MPC-Guided Data Augmentation for Robust, Efficient
Sensorimotor Policy Learning [49.05174527668836]
Imitation learning (IL) can generate computationally efficient sensorimotor policies from demonstrations provided by computationally expensive model-based sensing and control algorithms.
In this work, we combine IL with an output feedback robust tube model predictive controller to co-generate demonstrations and a data augmentation strategy to efficiently learn neural network-based sensorimotor policies.
We numerically demonstrate that our method can learn a robust visuomotor policy from a single demonstration--a two-orders of magnitude improvement in demonstration efficiency compared to existing IL methods.
arXiv Detail & Related papers (2022-10-18T19:59:17Z) - Model Predictive Control via On-Policy Imitation Learning [28.96122879515294]
We develop new sample complexity results and performance guarantees for data-driven Model Predictive Control.
Our algorithm uses the structure of constrained linear MPC, and our analysis uses the properties of the explicit MPC solution to theoretically bound the number of online MPC trajectories needed to achieve optimal performance.
arXiv Detail & Related papers (2022-10-17T16:06:06Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.