Model Predictive Control via On-Policy Imitation Learning
- URL: http://arxiv.org/abs/2210.09206v1
- Date: Mon, 17 Oct 2022 16:06:06 GMT
- Title: Model Predictive Control via On-Policy Imitation Learning
- Authors: Kwangjun Ahn, Zakaria Mhammedi, Horia Mania, Zhang-Wei Hong, Ali
Jadbabaie
- Abstract summary: We develop new sample complexity results and performance guarantees for data-driven Model Predictive Control.
Our algorithm uses the structure of constrained linear MPC, and our analysis uses the properties of the explicit MPC solution to theoretically bound the number of online MPC trajectories needed to achieve optimal performance.
- Score: 28.96122879515294
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we leverage the rapid advances in imitation learning, a topic
of intense recent focus in the Reinforcement Learning (RL) literature, to
develop new sample complexity results and performance guarantees for
data-driven Model Predictive Control (MPC) for constrained linear systems. In
its simplest form, imitation learning is an approach that tries to learn an
expert policy by querying samples from an expert. Recent approaches to
data-driven MPC have used the simplest form of imitation learning known as
behavior cloning to learn controllers that mimic the performance of MPC by
online sampling of the trajectories of the closed-loop MPC system. Behavior
cloning, however, is a method that is known to be data inefficient and suffer
from distribution shifts. As an alternative, we develop a variant of the
forward training algorithm which is an on-policy imitation learning method
proposed by Ross et al. (2010). Our algorithm uses the structure of constrained
linear MPC, and our analysis uses the properties of the explicit MPC solution
to theoretically bound the number of online MPC trajectories needed to achieve
optimal performance. We validate our results through simulations and show that
the forward training algorithm is indeed superior to behavior cloning when
applied to MPC.
Related papers
- Statistically Efficient Variance Reduction with Double Policy Estimation
for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning [53.97273491846883]
We propose DPE: an RL algorithm that blends offline sequence modeling and offline reinforcement learning with Double Policy Estimation.
We validate our method in multiple tasks of OpenAI Gym with D4RL benchmarks.
arXiv Detail & Related papers (2023-08-28T20:46:07Z) - Model-based adaptation for sample efficient transfer in reinforcement
learning control of parameter-varying systems [1.8799681615947088]
We leverage ideas from model-based control to address the sample efficiency problem of reinforcement learning algorithms.
We demonstrate that our approach is more sample-efficient than fine-tuning with reinforcement learning alone.
arXiv Detail & Related papers (2023-05-20T10:11:09Z) - Learning-based MPC from Big Data Using Reinforcement Learning [1.3124513975412255]
This paper presents an approach for learning Model Predictive Control (MPC) schemes directly from data using Reinforcement Learning (RL) methods.
We propose to tackle this issue by using tools from RL to learn a parameterized MPC scheme directly from data in an offline fashion.
Our approach derives an MPC scheme without having to solve it over the collected dataset, thereby eliminating the computational complexity of existing techniques for big data.
arXiv Detail & Related papers (2023-01-04T15:39:34Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - On Effective Scheduling of Model-based Reinforcement Learning [53.027698625496015]
We propose a framework named AutoMBPO to automatically schedule the real data ratio.
In this paper, we first theoretically analyze the role of real data in policy training, which suggests that gradually increasing the ratio of real data yields better performance.
arXiv Detail & Related papers (2021-11-16T15:24:59Z) - Demonstration-Efficient Guided Policy Search via Imitation of Robust
Tube MPC [36.3065978427856]
We propose a strategy to compress a computationally expensive Model Predictive Controller (MPC) into a more computationally efficient representation based on a deep neural network and Imitation Learning (IL)
By generating a Robust Tube variant (RTMPC) of the MPC and leveraging properties from the tube, we introduce a data augmentation method that enables high demonstration-efficiency.
Our method outperforms strategies commonly employed in IL, such as DAgger and Domain Randomization, in terms of demonstration-efficiency and robustness to perturbations unseen during training.
arXiv Detail & Related papers (2021-09-21T01:50:19Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - Imitation Learning from MPC for Quadrupedal Multi-Gait Control [63.617157490920505]
We present a learning algorithm for training a single policy that imitates multiple gaits of a walking robot.
We use and extend MPC-Net, which is an Imitation Learning approach guided by Model Predictive Control.
We validate our approach on hardware and show that a single learned policy can replace its teacher to control multiple gaits.
arXiv Detail & Related papers (2021-03-26T08:48:53Z) - On Training and Evaluation of Neural Network Approaches for Model
Predictive Control [9.8918553325509]
This paper is a framework for training and evaluation of Model Predictive Control (MPC) implemented using constrained neural networks.
The motivation is to replace real-time optimization in safety critical feedback control systems with learnt mappings in the form of neural networks with optimization layers.
arXiv Detail & Related papers (2020-05-08T15:37:55Z) - Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL.
We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.