Related papers: Gaussian Process Policy Iteration with Additive Schwarz Acceleration for Forward and Inverse HJB and Mean Field Game Problems

Gaussian Process Policy Iteration with Additive Schwarz Acceleration for Forward and Inverse HJB and Mean Field Game Problems

URL: http://arxiv.org/abs/2505.00909v1
Date: Thu, 01 May 2025 23:04:52 GMT
Title: Gaussian Process Policy Iteration with Additive Schwarz Acceleration for Forward and Inverse HJB and Mean Field Game Problems
Authors: Xianjin Yang, Jingguo Zhang,
Abstract summary: We propose a policy iteration framework for solving Hamilton--Jacobi--Bellman equations and mean field games.<n>Policy iteration is formulated as an alternating procedure between solving the value function under a fixed control policy and updating the policy based on the resulting value function.<n>To improve convergence, we incorporate the additive Schwarz acceleration as a preconditioning step following each policy update.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We propose a Gaussian Process (GP)-based policy iteration framework for addressing both forward and inverse problems in Hamilton--Jacobi--Bellman (HJB) equations and mean field games (MFGs). Policy iteration is formulated as an alternating procedure between solving the value function under a fixed control policy and updating the policy based on the resulting value function. By exploiting the linear structure of GPs for function approximation, each policy evaluation step admits an explicit closed-form solution, eliminating the need for numerical optimization. To improve convergence, we incorporate the additive Schwarz acceleration as a preconditioning step following each policy update. Numerical experiments demonstrate the effectiveness of Schwarz acceleration in improving computational efficiency.

Related papers

Achieve Performatively Optimal Policy for Performative Reinforcement Learning [55.983627302691424]
This work proposes a zeroth-order FrankWolfe- (0FW) algorithm with a gradient of performative policy in the framework.<n> Experimental results demonstrate that our 0FW is more effective than the existing approximation in finding the desired PO policy.
arXiv Detail & Related papers (2025-10-06T01:56:31Z)
Predictive Lagrangian Optimization for Constrained Reinforcement Learning [15.082498910832529]
Constrained optimization is popularly seen in reinforcement learning for addressing complex control tasks.<n>In this paper, we propose a more generic equivalence framework to build the connection between constrained optimization and feedback control system.
arXiv Detail & Related papers (2025-01-25T13:39:45Z)
Last-Iterate Convergent Policy Gradient Primal-Dual Methods for Constrained MDPs [107.28031292946774]
We study the problem of computing an optimal policy of an infinite-horizon discounted Markov decision process (constrained MDP) We develop two single-time-scale policy-based primal-dual algorithms with non-asymptotic convergence of their policy iterates to an optimal constrained policy. To the best of our knowledge, this work appears to be the first non-asymptotic policy last-iterate convergence result for single-time-scale algorithms in constrained MDPs.
arXiv Detail & Related papers (2023-06-20T17:27:31Z)
Policy Gradient Algorithms Implicitly Optimize by Continuation [7.351769270728942]
We argue that exploration in policy-gradient algorithms consists in a continuation of the return of the policy at hand, and that policies should be history-dependent rather than to maximize the return.
arXiv Detail & Related papers (2023-05-11T14:50:20Z)
Smoothing Policy Iteration for Zero-sum Markov Games [9.158672246275348]
We propose the smoothing policy robustness (SPI) algorithm to solve the zero-sum MGs approximately. Specially, the adversarial policy is served as the weight function to enable an efficient sampling over action spaces. We also propose a model-based algorithm called Smooth adversarial Actor-critic (SaAC) by extending SPI with the function approximations.
arXiv Detail & Related papers (2022-12-03T14:39:06Z)
Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time Guarantees [56.848265937921354]
Inverse reinforcement learning (IRL) aims to recover the reward function and the associated optimal policy. Many algorithms for IRL have an inherently nested structure. We develop a novel single-loop algorithm for IRL that does not compromise reward estimation accuracy.
arXiv Detail & Related papers (2022-10-04T17:13:45Z)
FORESEE: Prediction with Expansion-Compression Unscented Transform for Online Policy Optimization [8.97438370260135]
We introduce a method for state prediction, called the Expansion-Compression Unscented Transform, to solve a class of online policy optimization problems. Our proposed algorithm propagates a finite number of sigma points through a state-dependent distribution, which dictates an increase in the number of sigma points at each time step. Its performance is empirically shown to be comparable to Monte Carlo but at a much lower computational cost.
arXiv Detail & Related papers (2022-09-26T12:47:08Z)
Geometric Policy Iteration for Markov Decision Processes [4.746723775952672]
Recently discovered polyhedral structures of the value function for finite state-action discounted Markov decision processes (MDP) shed light on understanding the success of reinforcement learning. We propose a new algorithm, emphGeometric Policy Iteration, to solve discounted MDPs.
arXiv Detail & Related papers (2022-06-12T18:15:24Z)
Robust and Adaptive Temporal-Difference Learning Using An Ensemble of Gaussian Processes [70.80716221080118]
The paper takes a generative perspective on policy evaluation via temporal-difference (TD) learning. The OS-GPTD approach is developed to estimate the value function for a given policy by observing a sequence of state-reward pairs. To alleviate the limited expressiveness associated with a single fixed kernel, a weighted ensemble (E) of GP priors is employed to yield an alternative scheme.
arXiv Detail & Related papers (2021-12-01T23:15:09Z)
Optimistic Policy Optimization is Provably Efficient in Non-stationary MDPs [113.8752163061151]
We study episodic reinforcement learning (RL) in non-stationary linear kernel Markov decision processes (MDPs)<n>We propose the underlineperiodically underlinerestarted underlineoptimistic underlinepolicy underlineoptimization algorithm (PROPO)<n>PROPO features two mechanisms: sliding-window-based policy evaluation and periodic-restart-based policy improvement.
arXiv Detail & Related papers (2021-10-18T02:33:20Z)
A general class of surrogate functions for stable and efficient reinforcement learning [45.31904153659212]
We propose a general framework based on functional mirror ascent that gives rise to an entire family of surrogate functions. We construct surrogate functions that enable policy improvement guarantees, a property not shared by most existing surrogate functions. The proposed framework also suggests an improved variant of PPO, whose robustness and efficiency we empirically demonstrate.
arXiv Detail & Related papers (2021-08-12T16:19:19Z)
Near Optimal Policy Optimization via REPS [33.992374484681704]
emphrelative entropy policy search (REPS) has demonstrated successful policy learning on a number of simulated and real-world robotic domains. There exist no guarantees on REPS's performance when using gradient-based solvers. We introduce a technique that uses emphgenerative access to the underlying decision process to compute parameter updates that maintain favorable convergence to the optimal regularized policy.
arXiv Detail & Related papers (2021-03-17T16:22:59Z)
Average-Reward Off-Policy Policy Evaluation with Function Approximation [66.67075551933438]
We consider off-policy policy evaluation with function approximation in average-reward MDPs. bootstrapping is necessary and, along with off-policy learning and FA, results in the deadly triad. We propose two novel algorithms, reproducing the celebrated success of Gradient TD algorithms in the average-reward setting.
arXiv Detail & Related papers (2021-01-08T00:43:04Z)
Kalman meets Bellman: Improving Policy Evaluation through Value Tracking [59.691919635037216]
Policy evaluation is a key process in Reinforcement Learning (RL) We devise an optimization method, called Kalman Optimization for Value Approximation (KOVA) KOVA minimizes a regularized objective function that concerns both parameter and noisy return uncertainties.
arXiv Detail & Related papers (2020-02-17T13:30:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.