Computationally efficient Gauss-Newton reinforcement learning for model predictive control
- URL: http://arxiv.org/abs/2508.02441v1
- Date: Mon, 04 Aug 2025 14:00:40 GMT
- Title: Computationally efficient Gauss-Newton reinforcement learning for model predictive control
- Authors: Dean Brandner, Sebastien Gros, Sergio Lucia,
- Abstract summary: We introduce a Gauss-Newton approximation of the deterministic policy Hessian that eliminates the need for second-order policy derivatives.<n>We demonstrate the effectiveness of the approach on a nonlinear continuously stirred tank reactor.
- Score: 0.8437187555622164
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Model predictive control (MPC) is widely used in process control due to its interpretability and ability to handle constraints. As a parametric policy in reinforcement learning (RL), MPC offers strong initial performance and low data requirements compared to black-box policies like neural networks. However, most RL methods rely on first-order updates, which scale well to large parameter spaces but converge at most linearly, making them inefficient when each policy update requires solving an optimal control problem, as is the case with MPC. While MPC policies are typically sparsely parameterized and thus amenable to second-order approaches, existing second-order methods demand second-order policy derivatives, which can be computationally and memory-wise intractable. This work introduces a Gauss-Newton approximation of the deterministic policy Hessian that eliminates the need for second-order policy derivatives, enabling superlinear convergence with minimal computational overhead. To further improve robustness, we propose a momentum-based Hessian averaging scheme for stable training under noisy estimates. We demonstrate the effectiveness of the approach on a nonlinear continuously stirred tank reactor (CSTR), showing faster convergence and improved data efficiency over state-of-the-art first-order methods.
Related papers
- Neural Policy Iteration for Stochastic Optimal Control: A Physics-Informed Approach [2.8988658640181826]
We propose a physics-informed neural network policy iteration framework (PINN-PI)<n>At each iteration, a neural network is trained to approximate the value function by minimizing the residual of a linear PDE induced by a fixed policy.<n>We demonstrate the effectiveness of our method on several benchmark problems, including gradient cartpole, pendulum high-dimensional linear quadratic regulation (LQR) problems in up to 10D.
arXiv Detail & Related papers (2025-08-03T11:02:25Z) - Intersection of Reinforcement Learning and Bayesian Optimization for Intelligent Control of Industrial Processes: A Safe MPC-based DPG using Multi-Objective BO [0.0]
Model Predictive Control (MPC)-based Reinforcement Learning (RL) offers a structured and interpretable alternative to Deep Neural Network (DNN)-based RL methods.<n>Standard MPC-RL approaches often suffer from slow convergence, suboptimal policy learning due to limited parameterization, and safety issues during online adaptation.<n>We propose a novel framework that integrates MPC-RL with Multi-Objective Bayesian Optimization (MOBO)
arXiv Detail & Related papers (2025-07-14T02:31:52Z) - Predictive Lagrangian Optimization for Constrained Reinforcement Learning [15.082498910832529]
Constrained optimization is popularly seen in reinforcement learning for addressing complex control tasks.<n>In this paper, we propose a more generic equivalence framework to build the connection between constrained optimization and feedback control system.
arXiv Detail & Related papers (2025-01-25T13:39:45Z) - One-Shot Safety Alignment for Large Language Models via Optimal Dualization [64.52223677468861]
This paper presents a perspective of dualization that reduces constrained alignment to an equivalent unconstrained alignment problem.
We do so by pre-optimizing a smooth and convex dual function that has a closed form.
Our strategy leads to two practical algorithms in model-based and preference-based settings.
arXiv Detail & Related papers (2024-05-29T22:12:52Z) - Reinforced Model Predictive Control via Trust-Region Quasi-Newton Policy Optimization [0.0]
We propose a Quasi-Newton training algorithm for policy optimization with a superlinear convergence rate.
A simulation study illustrates that the proposed training algorithm outperforms other algorithms in terms of data efficiency and accuracy.
arXiv Detail & Related papers (2024-05-28T09:16:08Z) - Efficiently Training Deep-Learning Parametric Policies using Lagrangian Duality [55.06411438416805]
Constrained Markov Decision Processes (CMDPs) are critical in many high-stakes applications.<n>This paper introduces a novel approach, Two-Stage Deep Decision Rules (TS- DDR) to efficiently train parametric actor policies.<n>It is shown to enhance solution quality and to reduce computation times by several orders of magnitude when compared to current state-of-the-art methods.
arXiv Detail & Related papers (2024-05-23T18:19:47Z) - Iteratively Refined Behavior Regularization for Offline Reinforcement
Learning [57.10922880400715]
In this paper, we propose a new algorithm that substantially enhances behavior-regularization based on conservative policy iteration.
By iteratively refining the reference policy used for behavior regularization, conservative policy update guarantees gradually improvement.
Experimental results on the D4RL benchmark indicate that our method outperforms previous state-of-the-art baselines in most tasks.
arXiv Detail & Related papers (2023-06-09T07:46:24Z) - Offline Reinforcement Learning with Closed-Form Policy Improvement
Operators [88.54210578912554]
Behavior constrained policy optimization has been demonstrated to be a successful paradigm for tackling Offline Reinforcement Learning.
In this paper, we propose our closed-form policy improvement operators.
We empirically demonstrate their effectiveness over state-of-the-art algorithms on the standard D4RL benchmark.
arXiv Detail & Related papers (2022-11-29T06:29:26Z) - Optimistic Policy Optimization is Provably Efficient in Non-stationary MDPs [113.8752163061151]
We study episodic reinforcement learning (RL) in non-stationary linear kernel Markov decision processes (MDPs)<n>We propose the underlineperiodically underlinerestarted underlineoptimistic underlinepolicy underlineoptimization algorithm (PROPO)<n>PROPO features two mechanisms: sliding-window-based policy evaluation and periodic-restart-based policy improvement.
arXiv Detail & Related papers (2021-10-18T02:33:20Z) - Recurrent Model Predictive Control [19.047059454849897]
We propose an off-line algorithm, called Recurrent Model Predictive Control (RMPC), to solve general nonlinear finite-horizon optimal control problems.
Our algorithm employs a recurrent function to approximate the optimal policy, which maps the system states and reference values directly to the control inputs.
arXiv Detail & Related papers (2021-02-23T15:01:36Z) - MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data.
We show that an existing model-based RL algorithm already produces significant gains in the offline setting.
We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.