Output Feedback Tube MPC-Guided Data Augmentation for Robust, Efficient
Sensorimotor Policy Learning
- URL: http://arxiv.org/abs/2210.10127v1
- Date: Tue, 18 Oct 2022 19:59:17 GMT
- Title: Output Feedback Tube MPC-Guided Data Augmentation for Robust, Efficient
Sensorimotor Policy Learning
- Authors: Andrea Tagliabue, Jonathan P. How
- Abstract summary: Imitation learning (IL) can generate computationally efficient sensorimotor policies from demonstrations provided by computationally expensive model-based sensing and control algorithms.
In this work, we combine IL with an output feedback robust tube model predictive controller to co-generate demonstrations and a data augmentation strategy to efficiently learn neural network-based sensorimotor policies.
We numerically demonstrate that our method can learn a robust visuomotor policy from a single demonstration--a two-orders of magnitude improvement in demonstration efficiency compared to existing IL methods.
- Score: 49.05174527668836
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Imitation learning (IL) can generate computationally efficient sensorimotor
policies from demonstrations provided by computationally expensive model-based
sensing and control algorithms. However, commonly employed IL methods are often
data-inefficient, requiring the collection of a large number of demonstrations
and producing policies with limited robustness to uncertainties. In this work,
we combine IL with an output feedback robust tube model predictive controller
(RTMPC) to co-generate demonstrations and a data augmentation strategy to
efficiently learn neural network-based sensorimotor policies. Thanks to the
augmented data, we reduce the computation time and the number of demonstrations
needed by IL, while providing robustness to sensing and process uncertainty. We
tailor our approach to the task of learning a trajectory tracking visuomotor
policy for an aerial robot, leveraging a 3D mesh of the environment as part of
the data augmentation process. We numerically demonstrate that our method can
learn a robust visuomotor policy from a single demonstration--a two-orders of
magnitude improvement in demonstration efficiency compared to existing IL
methods.
Related papers
- AdaDemo: Data-Efficient Demonstration Expansion for Generalist Robotic Agent [75.91274222142079]
In this study, we aim to scale up demonstrations in a data-efficient way to facilitate the learning of generalist robotic agents.
AdaDemo is a framework designed to improve multi-task policy learning by actively and continually expanding the demonstration dataset.
arXiv Detail & Related papers (2024-04-11T01:59:29Z) - Tube-NeRF: Efficient Imitation Learning of Visuomotor Policies from MPC
using Tube-Guided Data Augmentation and NeRFs [42.220568722735095]
Imitation learning (IL) can train computationally-efficient sensorimotor policies from a resource-intensive Model Predictive Controller (MPC)
We propose a data augmentation (DA) strategy that enables efficient learning of vision-based policies.
We show 80-fold increase in demonstration efficiency and a 50% reduction in training time over current IL methods.
arXiv Detail & Related papers (2023-11-23T18:54:25Z) - Value function estimation using conditional diffusion models for control [62.27184818047923]
We propose a simple algorithm called Diffused Value Function (DVF)
It learns a joint multi-step model of the environment-robot interaction dynamics using a diffusion model.
We show how DVF can be used to efficiently capture the state visitation measure for multiple controllers.
arXiv Detail & Related papers (2023-06-09T18:40:55Z) - MoDem: Accelerating Visual Model-Based Reinforcement Learning with
Demonstrations [36.44386146801296]
Poor sample efficiency continues to be the primary challenge for deployment of deep Reinforcement Learning (RL) algorithms for real-world applications.
We find that leveraging just a handful of demonstrations can dramatically improve the sample-efficiency of model-based RL.
We empirically study three complex visuo-motor control domains and find that our method is 150%-250% more successful in completing sparse reward tasks.
arXiv Detail & Related papers (2022-12-12T04:28:50Z) - Data efficient reinforcement learning and adaptive optimal perimeter
control of network traffic dynamics [0.0]
This work proposes an integral reinforcement learning (IRL) based approach to learning the macroscopic traffic dynamics for adaptive optimal perimeter control.
To reduce the sampling complexity and use the available data more efficiently, the experience replay (ER) technique is introduced to the IRL algorithm.
The convergence of the IRL-based algorithms and the stability of the controlled traffic dynamics are proven via the Lyapunov theory.
arXiv Detail & Related papers (2022-09-13T04:28:49Z) - Demonstration-Efficient Guided Policy Search via Imitation of Robust
Tube MPC [36.3065978427856]
We propose a strategy to compress a computationally expensive Model Predictive Controller (MPC) into a more computationally efficient representation based on a deep neural network and Imitation Learning (IL)
By generating a Robust Tube variant (RTMPC) of the MPC and leveraging properties from the tube, we introduce a data augmentation method that enables high demonstration-efficiency.
Our method outperforms strategies commonly employed in IL, such as DAgger and Domain Randomization, in terms of demonstration-efficiency and robustness to perturbations unseen during training.
arXiv Detail & Related papers (2021-09-21T01:50:19Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - A Framework for Efficient Robotic Manipulation [79.10407063260473]
We show that a single robotic arm can learn sparse-reward manipulation policies from pixels.
We show that, given only 10 demonstrations, a single robotic arm can learn sparse-reward manipulation policies from pixels.
arXiv Detail & Related papers (2020-12-14T22:18:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.