Recurrent Model Predictive Control
- URL: http://arxiv.org/abs/2102.11736v1
- Date: Tue, 23 Feb 2021 15:01:36 GMT
- Title: Recurrent Model Predictive Control
- Authors: Zhengyu Liu, Jingliang Duan, Wenxuan Wang, Shengbo Eben Li, Yuming
Yin, Ziyu Lin, Qi Sun, Bo Cheng
- Abstract summary: We propose an off-line algorithm, called Recurrent Model Predictive Control (RMPC), to solve general nonlinear finite-horizon optimal control problems.
Our algorithm employs a recurrent function to approximate the optimal policy, which maps the system states and reference values directly to the control inputs.
- Score: 19.047059454849897
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper proposes an off-line algorithm, called Recurrent Model Predictive
Control (RMPC), to solve general nonlinear finite-horizon optimal control
problems. Unlike traditional Model Predictive Control (MPC) algorithms, it can
make full use of the current computing resources and adaptively select the
longest model prediction horizon. Our algorithm employs a recurrent function to
approximate the optimal policy, which maps the system states and reference
values directly to the control inputs. The number of prediction steps is equal
to the number of recurrent cycles of the learned policy function. With an
arbitrary initial policy function, the proposed RMPC algorithm can converge to
the optimal policy by directly minimizing the designed loss function. We
further prove the convergence and optimality of the RMPC algorithm thorough
Bellman optimality principle, and demonstrate its generality and efficiency
using two numerical examples.
Related papers
- Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer [52.09480867526656]
We identify the source of misalignment as a form of distributional shift and uncertainty in learning human preferences.
To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model.
Using the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines a preference optimization loss and a supervised learning loss.
arXiv Detail & Related papers (2024-05-26T05:38:50Z) - Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time
Guarantees [56.848265937921354]
Inverse reinforcement learning (IRL) aims to recover the reward function and the associated optimal policy.
Many algorithms for IRL have an inherently nested structure.
We develop a novel single-loop algorithm for IRL that does not compromise reward estimation accuracy.
arXiv Detail & Related papers (2022-10-04T17:13:45Z) - Neural Predictive Control for the Optimization of Smart Grid Flexibility
Schedules [0.0]
Model predictive control (MPC) is a method to formulate the optimal scheduling problem for grid flexibilities in a mathematical manner.
MPC methods promise accurate results for time-constrained grid optimization but they are inherently limited by the calculation time needed for large and complex power system models.
A Neural Predictive Control scheme is proposed to learn optimal control policies for linear and nonlinear power systems through imitation.
arXiv Detail & Related papers (2021-08-19T15:12:35Z) - Reinforcement Learning for Adaptive Optimal Stationary Control of Linear
Stochastic Systems [15.410124023805249]
This paper studies the adaptive optimal stationary control of continuous-time linear systems with both additive and multiplicative noises.
A novel off-policy reinforcement learning algorithm, named optimistic least-squares-based iteration policy, is proposed.
arXiv Detail & Related papers (2021-07-16T09:27:02Z) - Average-Reward Off-Policy Policy Evaluation with Function Approximation [66.67075551933438]
We consider off-policy policy evaluation with function approximation in average-reward MDPs.
bootstrapping is necessary and, along with off-policy learning and FA, results in the deadly triad.
We propose two novel algorithms, reproducing the celebrated success of Gradient TD algorithms in the average-reward setting.
arXiv Detail & Related papers (2021-01-08T00:43:04Z) - Approximate Midpoint Policy Iteration for Linear Quadratic Control [1.0312968200748118]
We present a midpoint policy iteration algorithm to solve linear quadratic optimal control problems in both model-based and model-free settings.
We show that in the model-based setting it achieves cubic convergence, which is superior to standard policy iteration and policy algorithms that achieve quadratic and linear convergence.
arXiv Detail & Related papers (2020-11-28T20:22:10Z) - Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs.
The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z) - Iterative Amortized Policy Optimization [147.63129234446197]
Policy networks are a central feature of deep reinforcement learning (RL) algorithms for continuous control.
From the variational inference perspective, policy networks are a form of textitamortized optimization, optimizing network parameters rather than the policy distributions directly.
We demonstrate that iterative amortized policy optimization, yields performance improvements over direct amortization on benchmark continuous control tasks.
arXiv Detail & Related papers (2020-10-20T23:25:42Z) - Variance-Reduced Off-Policy Memory-Efficient Policy Search [61.23789485979057]
Off-policy policy optimization is a challenging problem in reinforcement learning.
Off-policy algorithms are memory-efficient and capable of learning from off-policy samples.
arXiv Detail & Related papers (2020-09-14T16:22:46Z) - Model-free optimal control of discrete-time systems with additive and
multiplicative noises [1.656520517245166]
This paper investigates the optimal control problem for a class of discrete-time systems subject to additive and multiplicative noises.
A modelfree reinforcement learning algorithm is proposed to learn the optimal admissible control policy using the data of the system states and inputs.
It is proven that the learning algorithm converges to the optimal admissible control policy.
arXiv Detail & Related papers (2020-08-20T02:18:00Z) - Queueing Network Controls via Deep Reinforcement Learning [0.0]
We develop a Proximal policy optimization algorithm for queueing networks.
The algorithm consistently generates control policies that outperform state-of-arts in literature.
A key to the successes of our PPO algorithm is the use of three variance reduction techniques in estimating the relative value function.
arXiv Detail & Related papers (2020-07-31T01:02:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.