Towards an Adaptable and Generalizable Optimization Engine in Decision
and Control: A Meta Reinforcement Learning Approach
- URL: http://arxiv.org/abs/2401.02508v1
- Date: Thu, 4 Jan 2024 19:41:33 GMT
- Title: Towards an Adaptable and Generalizable Optimization Engine in Decision
and Control: A Meta Reinforcement Learning Approach
- Authors: Sungwook Yang, Chaoying Pei, Ran Dai, Chuangchuang Sun
- Abstract summary: We propose to learn an MPC controller based on meta-reinforcement learning (RL) to update controllers.
This does not need expert demonstration and can enable fast adaptation when it is deployed in unseen control tasks.
- Score: 6.302621910090619
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sampling-based model predictive control (MPC) has found significant success
in optimal control problems with non-smooth system dynamics and cost function.
Many machine learning-based works proposed to improve MPC by a) learning or
fine-tuning the dynamics/ cost function, or b) learning to optimize for the
update of the MPC controllers. For the latter, imitation learning-based
optimizers are trained to update the MPC controller by mimicking the expert
demonstrations, which, however, are expensive or even unavailable. More
significantly, many sequential decision-making problems are in non-stationary
environments, requiring that an optimizer should be adaptable and generalizable
to update the MPC controller for solving different tasks. To address those
issues, we propose to learn an optimizer based on meta-reinforcement learning
(RL) to update the controllers. This optimizer does not need expert
demonstration and can enable fast adaptation (e.g., few-shots) when it is
deployed in unseen control tasks. Experimental results validate the
effectiveness of the learned optimizer regarding fast adaptation.
Related papers
- Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate [105.86576388991713]
We introduce a normalized gradient difference (NGDiff) algorithm, enabling us to have better control over the trade-off between the objectives.
We provide a theoretical analysis and empirically demonstrate the superior performance of NGDiff among state-of-the-art unlearning methods on the TOFU and MUSE datasets.
arXiv Detail & Related papers (2024-10-29T14:41:44Z) - A Safe Reinforcement Learning driven Weights-varying Model Predictive
Control for Autonomous Vehicle Motion Control [2.07180164747172]
We propose a novel approach to determine the optimal cost function parameters of Model Predictive Control (MPC)
We conceive a RL agent not to learn in a continuous space but to proactively anticipate upcoming control tasks.
arXiv Detail & Related papers (2024-02-04T22:09:28Z) - Learning to Optimize in Model Predictive Control [36.82905770866734]
Sampling-based Model Predictive Control (MPC) is a flexible control framework that can reason about non-smooth dynamics and cost functions.
We show that this can be particularly useful in sampling-based MPC, where we often wish to minimize the number of samples.
We show that we can contend with this noise by learning how to update the control distribution more effectively and make better use of the few samples that we have.
arXiv Detail & Related papers (2022-12-05T21:20:10Z) - Offline Supervised Learning V.S. Online Direct Policy Optimization: A Comparative Study and A Unified Training Paradigm for Neural Network-Based Optimal Feedback Control [7.242569453287703]
We first conduct a comparative study of two prevalent approaches: offline supervised learning and online direct policy optimization.
Our results underscore the superiority of offline supervised learning in terms of both optimality and training time.
We propose the Pre-train and Fine-tune strategy as a unified training paradigm for optimal feedback control.
arXiv Detail & Related papers (2022-11-29T05:07:13Z) - Improving the Performance of Robust Control through Event-Triggered
Learning [74.57758188038375]
We propose an event-triggered learning algorithm that decides when to learn in the face of uncertainty in the LQR problem.
We demonstrate improved performance over a robust controller baseline in a numerical example.
arXiv Detail & Related papers (2022-07-28T17:36:37Z) - Policy Search for Model Predictive Control with Application to Agile
Drone Flight [56.24908013905407]
We propose a policy-search-for-model-predictive-control framework for MPC.
Specifically, we formulate the MPC as a parameterized controller, where the hard-to-optimize decision variables are represented as high-level policies.
Experiments show that our controller achieves robust and real-time control performance in both simulation and the real world.
arXiv Detail & Related papers (2021-12-07T17:39:24Z) - Optimization of the Model Predictive Control Meta-Parameters Through
Reinforcement Learning [1.4069478981641936]
We propose a novel framework in which any parameter of the control algorithm can be jointly tuned using reinforcement learning(RL)
We demonstrate our framework on the inverted pendulum control task, reducing the total time of the control system by 36% while also improving the control performance by 18.4% over the best-performing MPC baseline.
arXiv Detail & Related papers (2021-11-07T18:33:22Z) - Meta-Learning with Adaptive Hyperparameters [55.182841228303225]
We focus on a complementary factor in MAML framework, inner-loop optimization (or fast adaptation)
We propose a new weight update rule that greatly enhances the fast adaptation process.
arXiv Detail & Related papers (2020-10-31T08:05:34Z) - Anticipating the Long-Term Effect of Online Learning in Control [75.6527644813815]
AntLer is a design algorithm for learning-based control laws that anticipates learning.
We show that AntLer approximates an optimal solution arbitrarily accurately with probability one.
arXiv Detail & Related papers (2020-07-24T07:00:14Z) - Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL.
We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.