Deep Model Predictive Optimization
- URL: http://arxiv.org/abs/2310.04590v1
- Date: Fri, 6 Oct 2023 21:11:52 GMT
- Title: Deep Model Predictive Optimization
- Authors: Jacob Sacks, Rwik Rana, Kevin Huang, Alex Spitzer, Guanya Shi, Byron
Boots
- Abstract summary: A major challenge in robotics is to design robust policies which enable complex and agile behaviors in the real world.
We propose Deep Model Predictive Optimization (DMPO), which learns the inner-loop of an MPC optimization algorithm directly via experience.
DMPO can outperform the best MPC algorithm by up to 27% with fewer samples and an end-to-end policy trained with MFRL by 19%.
- Score: 21.22047409735362
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A major challenge in robotics is to design robust policies which enable
complex and agile behaviors in the real world. On one end of the spectrum, we
have model-free reinforcement learning (MFRL), which is incredibly flexible and
general but often results in brittle policies. In contrast, model predictive
control (MPC) continually re-plans at each time step to remain robust to
perturbations and model inaccuracies. However, despite its real-world
successes, MPC often under-performs the optimal strategy. This is due to model
quality, myopic behavior from short planning horizons, and approximations due
to computational constraints. And even with a perfect model and enough compute,
MPC can get stuck in bad local optima, depending heavily on the quality of the
optimization algorithm. To this end, we propose Deep Model Predictive
Optimization (DMPO), which learns the inner-loop of an MPC optimization
algorithm directly via experience, specifically tailored to the needs of the
control problem. We evaluate DMPO on a real quadrotor agile trajectory tracking
task, on which it improves performance over a baseline MPC algorithm for a
given computational budget. It can outperform the best MPC algorithm by up to
27% with fewer samples and an end-to-end policy trained with MFRL by 19%.
Moreover, because DMPO requires fewer samples, it can also achieve these
benefits with 4.3X less memory. When we subject the quadrotor to turbulent wind
fields with an attached drag plate, DMPO can adapt zero-shot while still
outperforming all baselines. Additional results can be found at
https://tinyurl.com/mr2ywmnw.
Related papers
- Transformer-based Model Predictive Control: Trajectory Optimization via Sequence Modeling [16.112708478263745]
We present a unified framework combine the main strengths of optimization-based methods for learning.
Our approach entails embedding high-capacity, transformer-based neural network models within optimization process.
Compared to purely optimization-based approaches, results show that our approach can improve performance by up to 75%.
arXiv Detail & Related papers (2024-10-31T13:23:10Z) - Maximize to Explore: One Objective Function Fusing Estimation, Planning,
and Exploration [87.53543137162488]
We propose an easy-to-implement online reinforcement learning (online RL) framework called textttMEX.
textttMEX integrates estimation and planning components while balancing exploration exploitation automatically.
It can outperform baselines by a stable margin in various MuJoCo environments with sparse rewards.
arXiv Detail & Related papers (2023-05-29T17:25:26Z) - Policy Search for Model Predictive Control with Application to Agile
Drone Flight [56.24908013905407]
We propose a policy-search-for-model-predictive-control framework for MPC.
Specifically, we formulate the MPC as a parameterized controller, where the hard-to-optimize decision variables are represented as high-level policies.
Experiments show that our controller achieves robust and real-time control performance in both simulation and the real world.
arXiv Detail & Related papers (2021-12-07T17:39:24Z) - On Effective Scheduling of Model-based Reinforcement Learning [53.027698625496015]
We propose a framework named AutoMBPO to automatically schedule the real data ratio.
In this paper, we first theoretically analyze the role of real data in policy training, which suggests that gradually increasing the ratio of real data yields better performance.
arXiv Detail & Related papers (2021-11-16T15:24:59Z) - Learning Model Predictive Controllers for Real-Time Ride-Hailing Vehicle
Relocation and Pricing Decisions [15.80796896560034]
Large-scale ride-hailing systems often combine real-time routing at the individual request level with a macroscopic Model Predictive Control (MPC) optimization for dynamic pricing and vehicle relocation.
This paper addresses these computational challenges by learning the MPC optimization.
The resulting machine-learning model then serves as the optimization proxy and predicts its optimal solutions.
arXiv Detail & Related papers (2021-11-05T00:52:15Z) - Neural Predictive Control for the Optimization of Smart Grid Flexibility
Schedules [0.0]
Model predictive control (MPC) is a method to formulate the optimal scheduling problem for grid flexibilities in a mathematical manner.
MPC methods promise accurate results for time-constrained grid optimization but they are inherently limited by the calculation time needed for large and complex power system models.
A Neural Predictive Control scheme is proposed to learn optimal control policies for linear and nonlinear power systems through imitation.
arXiv Detail & Related papers (2021-08-19T15:12:35Z) - Optimal Cost Design for Model Predictive Control [30.86835688868485]
Many robotics domains use non model control (MPC) for planning, which sets a reduced time horizon, performs optimization, and replans at every step.
In this work, we challenge the common assumption that the cost we optimize using MPC should be the same as the ground truth cost for the task (plus a terminal cost)
We propose a zeroth-order trajectory-based approach that enables us to design optimal costs for an MPC planning robot in continuous MDPs.
arXiv Detail & Related papers (2021-04-23T00:00:58Z) - Covert Model Poisoning Against Federated Learning: Algorithm Design and
Optimization [76.51980153902774]
Federated learning (FL) is vulnerable to external attacks on FL models during parameters transmissions.
In this paper, we propose effective MP algorithms to combat state-of-the-art defensive aggregation mechanisms.
Our experimental results demonstrate that the proposed CMP algorithms are effective and substantially outperform existing attack mechanisms.
arXiv Detail & Related papers (2021-01-28T03:28:18Z) - Blending MPC & Value Function Approximation for Efficient Reinforcement
Learning [42.429730406277315]
Model-Predictive Control (MPC) is a powerful tool for controlling complex, real-world systems.
We present a framework for improving on MPC with model-free reinforcement learning (RL)
We show that our approach can obtain performance comparable with MPC with access to true dynamics.
arXiv Detail & Related papers (2020-12-10T11:32:01Z) - Mixed Strategies for Robust Optimization of Unknown Objectives [93.8672371143881]
We consider robust optimization problems, where the goal is to optimize an unknown objective function against the worst-case realization of an uncertain parameter.
We design a novel sample-efficient algorithm GP-MRO, which sequentially learns about the unknown objective from noisy point evaluations.
GP-MRO seeks to discover a robust and randomized mixed strategy, that maximizes the worst-case expected objective value.
arXiv Detail & Related papers (2020-02-28T09:28:17Z) - Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL.
We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.