A Safe Reinforcement Learning driven Weights-varying Model Predictive
Control for Autonomous Vehicle Motion Control
- URL: http://arxiv.org/abs/2402.02624v1
- Date: Sun, 4 Feb 2024 22:09:28 GMT
- Title: A Safe Reinforcement Learning driven Weights-varying Model Predictive
Control for Autonomous Vehicle Motion Control
- Authors: Baha Zarrouki, Marios Spanakakis and Johannes Betz
- Abstract summary: We propose a novel approach to determine the optimal cost function parameters of Model Predictive Control (MPC)
We conceive a RL agent not to learn in a continuous space but to proactively anticipate upcoming control tasks.
- Score: 2.07180164747172
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Determining the optimal cost function parameters of Model Predictive Control
(MPC) to optimize multiple control objectives is a challenging and
time-consuming task. Multiobjective Bayesian Optimization (BO) techniques solve
this problem by determining a Pareto optimal parameter set for an MPC with
static weights. However, a single parameter set may not deliver the most
optimal closed-loop control performance when the context of the MPC operating
conditions changes during its operation, urging the need to adapt the cost
function weights at runtime. Deep Reinforcement Learning (RL) algorithms can
automatically learn context-dependent optimal parameter sets and dynamically
adapt for a Weightsvarying MPC (WMPC). However, learning cost function weights
from scratch in a continuous action space may lead to unsafe operating states.
To solve this, we propose a novel approach limiting the RL actions within a
safe learning space representing a catalog of pre-optimized BO Pareto-optimal
weight sets. We conceive a RL agent not to learn in a continuous space but to
proactively anticipate upcoming control tasks and to choose the most optimal
discrete actions, each corresponding to a single set of Pareto optimal weights,
context-dependent. Hence, even an untrained RL agent guarantees a safe and
optimal performance. Experimental results demonstrate that an untrained RL-WMPC
shows Pareto-optimal closed-loop behavior and training the RL-WMPC helps
exhibit a performance beyond the Pareto-front.
Related papers
- Stability-informed Bayesian Optimization for MPC Cost Function Learning [5.643541009427271]
This work explores closed-loop learning for predictive control parameters under imperfect information.
We employ constrained Bayesian optimization to learn a model predictive controller's (MPC) cost function parametrized as a feedforward neural network.
We extend this framework by stability constraints on the learned controller parameters, exploiting the optimal value function of the underlying MPC as a Lyapunov candidate.
arXiv Detail & Related papers (2024-04-18T13:49:09Z) - Towards an Adaptable and Generalizable Optimization Engine in Decision
and Control: A Meta Reinforcement Learning Approach [6.302621910090619]
We propose to learn an MPC controller based on meta-reinforcement learning (RL) to update controllers.
This does not need expert demonstration and can enable fast adaptation when it is deployed in unseen control tasks.
arXiv Detail & Related papers (2024-01-04T19:41:33Z) - Deep Model Predictive Optimization [21.22047409735362]
A major challenge in robotics is to design robust policies which enable complex and agile behaviors in the real world.
We propose Deep Model Predictive Optimization (DMPO), which learns the inner-loop of an MPC optimization algorithm directly via experience.
DMPO can outperform the best MPC algorithm by up to 27% with fewer samples and an end-to-end policy trained with MFRL by 19%.
arXiv Detail & Related papers (2023-10-06T21:11:52Z) - Controllable Dynamic Multi-Task Architectures [92.74372912009127]
We propose a controllable multi-task network that dynamically adjusts its architecture and weights to match the desired task preference as well as the resource constraints.
We propose a disentangled training of two hypernetworks, by exploiting task affinity and a novel branching regularized loss, to take input preferences and accordingly predict tree-structured models with adapted weights.
arXiv Detail & Related papers (2022-03-28T17:56:40Z) - Policy Search for Model Predictive Control with Application to Agile
Drone Flight [56.24908013905407]
We propose a policy-search-for-model-predictive-control framework for MPC.
Specifically, we formulate the MPC as a parameterized controller, where the hard-to-optimize decision variables are represented as high-level policies.
Experiments show that our controller achieves robust and real-time control performance in both simulation and the real world.
arXiv Detail & Related papers (2021-12-07T17:39:24Z) - Optimization of the Model Predictive Control Meta-Parameters Through
Reinforcement Learning [1.4069478981641936]
We propose a novel framework in which any parameter of the control algorithm can be jointly tuned using reinforcement learning(RL)
We demonstrate our framework on the inverted pendulum control task, reducing the total time of the control system by 36% while also improving the control performance by 18.4% over the best-performing MPC baseline.
arXiv Detail & Related papers (2021-11-07T18:33:22Z) - Evaluating model-based planning and planner amortization for continuous
control [79.49319308600228]
We take a hybrid approach, combining model predictive control (MPC) with a learned model and model-free policy learning.
We find that well-tuned model-free agents are strong baselines even for high DoF control problems.
We show that it is possible to distil a model-based planner into a policy that amortizes the planning without any loss of performance.
arXiv Detail & Related papers (2021-10-07T12:00:40Z) - Robust Value Iteration for Continuous Control Tasks [99.00362538261972]
When transferring a control policy from simulation to a physical system, the policy needs to be robust to variations in the dynamics to perform well.
We present Robust Fitted Value Iteration, which uses dynamic programming to compute the optimal value function on the compact state domain.
We show that robust value is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm.
arXiv Detail & Related papers (2021-05-25T19:48:35Z) - Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot
Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO)
We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z) - Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL.
We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.