Learning High-Level Policies for Model Predictive Control
- URL: http://arxiv.org/abs/2007.10284v2
- Date: Sun, 9 May 2021 16:47:53 GMT
- Title: Learning High-Level Policies for Model Predictive Control
- Authors: Yunlong Song, Davide Scaramuzza
- Abstract summary: Model Predictive Control (MPC) provides robust solutions to robot control tasks.
We propose a self-supervised learning algorithm for learning a neural network high-level policy.
We show that our approach can handle situations that are difficult for standard MPC.
- Score: 54.00297896763184
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The combination of policy search and deep neural networks holds the promise
of automating a variety of decision-making tasks. Model Predictive Control
(MPC) provides robust solutions to robot control tasks by making use of a
dynamical model of the system and solving an optimization problem online over a
short planning horizon. In this work, we leverage probabilistic decision-making
approaches and the generalization capability of artificial neural networks to
the powerful online optimization by learning a deep high-level policy for the
MPC (High-MPC). Conditioning on robot's local observations, the trained neural
network policy is capable of adaptively selecting high-level decision variables
for the low-level MPC controller, which then generates optimal control commands
for the robot. First, we formulate the search of high-level decision variables
for MPC as a policy search problem, specifically, a probabilistic inference
problem. The problem can be solved in a closed-form solution. Second, we
propose a self-supervised learning algorithm for learning a neural network
high-level policy, which is useful for online hyperparameter adaptations in
highly dynamic environments. We demonstrate the importance of incorporating the
online adaption into autonomous robots by using the proposed method to solve a
challenging control problem, where the task is to control a simulated quadrotor
to fly through a swinging gate. We show that our approach can handle situations
that are difficult for standard MPC.
Related papers
- Dropout MPC: An Ensemble Neural MPC Approach for Systems with Learned Dynamics [0.0]
We propose a novel sampling-based ensemble neural MPC algorithm that employs the Monte-Carlo dropout technique on the learned system model.
The method aims in general at uncertain systems with complex dynamics, where models derived from first principles are hard to infer.
arXiv Detail & Related papers (2024-06-04T17:15:25Z) - Pontryagin Optimal Control via Neural Networks [19.546571122359534]
We integrate Neural Networks with the Pontryagin's Maximum Principle (PMP), and propose a sample efficient framework NN-PMP-Gradient.
The resulting controller can be implemented for systems with unknown and complex dynamics.
Compared with the widely applied model-free and model-based reinforcement learning (RL) algorithms, our NN-PMP-Gradient achieves higher sample-efficiency and performance in terms of control objectives.
arXiv Detail & Related papers (2022-12-30T06:47:03Z) - Efficient Domain Coverage for Vehicles with Second-Order Dynamics via
Multi-Agent Reinforcement Learning [9.939081691797858]
We present a reinforcement learning (RL) approach for the multi-agent efficient domain coverage problem involving agents with second-order dynamics.
Our proposed network architecture includes the incorporation of LSTM and self-attention, which allows the trained policy to adapt to a variable number of agents.
arXiv Detail & Related papers (2022-11-11T01:59:12Z) - Learning Robust Policy against Disturbance in Transition Dynamics via
State-Conservative Policy Optimization [63.75188254377202]
Deep reinforcement learning algorithms can perform poorly in real-world tasks due to discrepancy between source and target environments.
We propose a novel model-free actor-critic algorithm to learn robust policies without modeling the disturbance in advance.
Experiments in several robot control tasks demonstrate that SCPO learns robust policies against the disturbance in transition dynamics.
arXiv Detail & Related papers (2021-12-20T13:13:05Z) - Policy Search for Model Predictive Control with Application to Agile
Drone Flight [56.24908013905407]
We propose a policy-search-for-model-predictive-control framework for MPC.
Specifically, we formulate the MPC as a parameterized controller, where the hard-to-optimize decision variables are represented as high-level policies.
Experiments show that our controller achieves robust and real-time control performance in both simulation and the real world.
arXiv Detail & Related papers (2021-12-07T17:39:24Z) - Adversarially Regularized Policy Learning Guided by Trajectory
Optimization [31.122262331980153]
We propose adVErsarially Regularized pOlicy learNIng guided by trajeCtory optimizAtion (VERONICA) for learning smooth control policies.
Our proposed approach improves the sample efficiency of neural policy learning and enhances the robustness of the policy against various types of disturbances.
arXiv Detail & Related papers (2021-09-16T00:02:11Z) - Deep Policy Dynamic Programming for Vehicle Routing Problems [89.96386273895985]
We propose Deep Policy Dynamic Programming (D PDP) to combine the strengths of learned neurals with those of dynamic programming algorithms.
D PDP prioritizes and restricts the DP state space using a policy derived from a deep neural network, which is trained to predict edges from example solutions.
We evaluate our framework on the travelling salesman problem (TSP) and the vehicle routing problem (VRP) and show that the neural policy improves the performance of (restricted) DP algorithms.
arXiv Detail & Related papers (2021-02-23T15:33:57Z) - Non-stationary Online Learning with Memory and Non-stochastic Control [71.14503310914799]
We study the problem of Online Convex Optimization (OCO) with memory, which allows loss functions to depend on past decisions.
In this paper, we introduce dynamic policy regret as the performance measure to design algorithms robust to non-stationary environments.
We propose a novel algorithm for OCO with memory that provably enjoys an optimal dynamic policy regret in terms of time horizon, non-stationarity measure, and memory length.
arXiv Detail & Related papers (2021-02-07T09:45:15Z) - Reconfigurable Intelligent Surface Assisted Mobile Edge Computing with
Heterogeneous Learning Tasks [53.1636151439562]
Mobile edge computing (MEC) provides a natural platform for AI applications.
We present an infrastructure to perform machine learning tasks at an MEC with the assistance of a reconfigurable intelligent surface (RIS)
Specifically, we minimize the learning error of all participating users by jointly optimizing transmit power of mobile users, beamforming vectors of the base station, and the phase-shift matrix of the RIS.
arXiv Detail & Related papers (2020-12-25T07:08:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.