Related papers: Learning High-Level Policies for Model Predictive Control

Learning High-Level Policies for Model Predictive Control

URL: http://arxiv.org/abs/2007.10284v2
Date: Sun, 9 May 2021 16:47:53 GMT
Title: Learning High-Level Policies for Model Predictive Control
Authors: Yunlong Song, Davide Scaramuzza
Abstract summary: Model Predictive Control (MPC) provides robust solutions to robot control tasks. We propose a self-supervised learning algorithm for learning a neural network high-level policy. We show that our approach can handle situations that are difficult for standard MPC.
Score: 54.00297896763184
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The combination of policy search and deep neural networks holds the promise of automating a variety of decision-making tasks. Model Predictive Control (MPC) provides robust solutions to robot control tasks by making use of a dynamical model of the system and solving an optimization problem online over a short planning horizon. In this work, we leverage probabilistic decision-making approaches and the generalization capability of artificial neural networks to the powerful online optimization by learning a deep high-level policy for the MPC (High-MPC). Conditioning on robot's local observations, the trained neural network policy is capable of adaptively selecting high-level decision variables for the low-level MPC controller, which then generates optimal control commands for the robot. First, we formulate the search of high-level decision variables for MPC as a policy search problem, specifically, a probabilistic inference problem. The problem can be solved in a closed-form solution. Second, we propose a self-supervised learning algorithm for learning a neural network high-level policy, which is useful for online hyperparameter adaptations in highly dynamic environments. We demonstrate the importance of incorporating the online adaption into autonomous robots by using the proposed method to solve a challenging control problem, where the task is to control a simulated quadrotor to fly through a swinging gate. We show that our approach can handle situations that are difficult for standard MPC.

Related papers

Integrating Reinforcement Learning and Model Predictive Control with Applications to Microgrids [14.389086937116582]
This work proposes an approach that integrates reinforcement learning and model predictive control (MPC) to solve finite-horizon optimal control problems efficiently. Our approach aims to mitigate this issue by decoupling the decision on the discrete variables from the decision on the continuous variables. In the proposed approach, reinforcement learning determines the discrete decision variables and simplifies the online problem of the MPC controller from a mixed-integer linear program to a linear program.
arXiv Detail & Related papers (2024-09-17T15:17:16Z)
Dropout MPC: An Ensemble Neural MPC Approach for Systems with Learned Dynamics [0.0]
We propose a novel sampling-based ensemble neural MPC algorithm that employs the Monte-Carlo dropout technique on the learned system model. The method aims in general at uncertain systems with complex dynamics, where models derived from first principles are hard to infer.
arXiv Detail & Related papers (2024-06-04T17:15:25Z)
Pontryagin Optimal Control via Neural Networks [19.546571122359534]
We integrate Neural Networks with the Pontryagin's Maximum Principle (PMP), and propose a sample efficient framework NN-PMP-Gradient. The resulting controller can be implemented for systems with unknown and complex dynamics. Compared with the widely applied model-free and model-based reinforcement learning (RL) algorithms, our NN-PMP-Gradient achieves higher sample-efficiency and performance in terms of control objectives.
arXiv Detail & Related papers (2022-12-30T06:47:03Z)
Efficient Domain Coverage for Vehicles with Second-Order Dynamics via Multi-Agent Reinforcement Learning [9.939081691797858]
We present a reinforcement learning (RL) approach for the multi-agent efficient domain coverage problem involving agents with second-order dynamics. Our proposed network architecture includes the incorporation of LSTM and self-attention, which allows the trained policy to adapt to a variable number of agents.
arXiv Detail & Related papers (2022-11-11T01:59:12Z)
Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization [63.75188254377202]
Deep reinforcement learning algorithms can perform poorly in real-world tasks due to discrepancy between source and target environments. We propose a novel model-free actor-critic algorithm to learn robust policies without modeling the disturbance in advance. Experiments in several robot control tasks demonstrate that SCPO learns robust policies against the disturbance in transition dynamics.
arXiv Detail & Related papers (2021-12-20T13:13:05Z)
Policy Search for Model Predictive Control with Application to Agile Drone Flight [56.24908013905407]
We propose a policy-search-for-model-predictive-control framework for MPC. Specifically, we formulate the MPC as a parameterized controller, where the hard-to-optimize decision variables are represented as high-level policies. Experiments show that our controller achieves robust and real-time control performance in both simulation and the real world.
arXiv Detail & Related papers (2021-12-07T17:39:24Z)
Adversarially Regularized Policy Learning Guided by Trajectory Optimization [31.122262331980153]
We propose adVErsarially Regularized pOlicy learNIng guided by trajeCtory optimizAtion (VERONICA) for learning smooth control policies. Our proposed approach improves the sample efficiency of neural policy learning and enhances the robustness of the policy against various types of disturbances.
arXiv Detail & Related papers (2021-09-16T00:02:11Z)
Deep Policy Dynamic Programming for Vehicle Routing Problems [89.96386273895985]
We propose Deep Policy Dynamic Programming (D PDP) to combine the strengths of learned neurals with those of dynamic programming algorithms. D PDP prioritizes and restricts the DP state space using a policy derived from a deep neural network, which is trained to predict edges from example solutions. We evaluate our framework on the travelling salesman problem (TSP) and the vehicle routing problem (VRP) and show that the neural policy improves the performance of (restricted) DP algorithms.
arXiv Detail & Related papers (2021-02-23T15:33:57Z)
Non-stationary Online Learning with Memory and Non-stochastic Control [71.14503310914799]
We study the problem of Online Convex Optimization (OCO) with memory, which allows loss functions to depend on past decisions. In this paper, we introduce dynamic policy regret as the performance measure to design algorithms robust to non-stationary environments. We propose a novel algorithm for OCO with memory that provably enjoys an optimal dynamic policy regret in terms of time horizon, non-stationarity measure, and memory length.
arXiv Detail & Related papers (2021-02-07T09:45:15Z)
Reconfigurable Intelligent Surface Assisted Mobile Edge Computing with Heterogeneous Learning Tasks [53.1636151439562]
Mobile edge computing (MEC) provides a natural platform for AI applications. We present an infrastructure to perform machine learning tasks at an MEC with the assistance of a reconfigurable intelligent surface (RIS) Specifically, we minimize the learning error of all participating users by jointly optimizing transmit power of mobile users, beamforming vectors of the base station, and the phase-shift matrix of the RIS.
arXiv Detail & Related papers (2020-12-25T07:08:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.