On Convergence Analysis of Policy Iteration Algorithms for Entropy-Regularized Stochastic Control Problems
- URL: http://arxiv.org/abs/2406.10959v3
- Date: Fri, 26 Jul 2024 00:15:47 GMT
- Title: On Convergence Analysis of Policy Iteration Algorithms for Entropy-Regularized Stochastic Control Problems
- Authors: Jin Ma, Gaozhan Wang, Jianfeng Zhang,
- Abstract summary: We investigate the issues regarding the convergence of the Policy Iteration Algorithm(PIA) for a class of general continuous-time entropy-regularized control problems.
We show that our approach can also be extended to the case when diffusion contains control, in the one dimensional setting but without much extra constraints on the coefficients.
- Score: 19.742628365680353
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we investigate the issues regarding the convergence of the Policy Iteration Algorithm(PIA) for a class of general continuous-time entropy-regularized stochastic control problems. In particular, instead of employing sophisticated PDE estimates for the iterative PDEs involved in the PIA (see, e.g., Huang-Wang-Zhou(2023)), we shall provide a simple proof from scratch for the convergence of the PIA. Our approach builds on probabilistic representation formulae for solutions of PDEs and their derivatives. Moreover, in the infinite horizon model with large discount factor and in the finite horizon model, the similar arguments lead to the exponential rate of convergence of PIA without tear. Finally, with some extra efforts we show that our approach can also be extended to the case when diffusion contains control, in the one dimensional setting but without much extra constraints on the coefficients. We believe that these results are new in the literature.
Related papers
- Beyond Derivative Pathology of PINNs: Variable Splitting Strategy with Convergence Analysis [6.468495781611434]
Physics-informed neural networks (PINNs) have emerged as effective methods for solving partial differential equations (PDEs) in various problems.
In this study, we prove that PINNs encounter a fundamental issue that the premise is invalid.
We propose a textitvariable splitting strategy that addresses this issue by parameterizing the gradient of the solution as an auxiliary variable.
arXiv Detail & Related papers (2024-09-30T15:20:10Z) - Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning [62.81324245896717]
We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global last-ite convergence guarantees under (weak) gradient domination assumptions.
We numerically validate our algorithms on constrained control problems, and compare them with state-of-the-art baselines.
arXiv Detail & Related papers (2024-07-15T14:54:57Z) - A Unified Theory of Stochastic Proximal Point Methods without Smoothness [52.30944052987393]
Proximal point methods have attracted considerable interest owing to their numerical stability and robustness against imperfect tuning.
This paper presents a comprehensive analysis of a broad range of variations of the proximal point method (SPPM)
arXiv Detail & Related papers (2024-05-24T21:09:19Z) - Generalization Bounds for Heavy-Tailed SDEs through the Fractional Fokker-Planck Equation [1.8416014644193066]
We prove high-probability bounds generalization for heavy-tailed SDEs with no nontrivial information theoretic terms.
Our results suggest that heavy tails can be either beneficial or harmful depending on the problem structure.
arXiv Detail & Related papers (2024-02-12T15:35:32Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - A PDE approach for regret bounds under partial monitoring [8.277466108000203]
We study a learning problem in which a forecaster observes partial information.
We show that the problem of obtaining regret bounds and efficient algorithms can be tackled by finding appropriate smooth sub/supersolutions.
arXiv Detail & Related papers (2022-09-02T20:04:30Z) - DiffNet: Neural Field Solutions of Parametric Partial Differential
Equations [30.80582606420882]
We consider a mesh-based approach for training a neural network to produce field predictions of solutions to PDEs.
We use a weighted Galerkin loss function based on the Finite Element Method (FEM) on a parametric elliptic PDE.
We prove theoretically, and illustrate with experiments, convergence results analogous to mesh convergence analysis deployed in finite element solutions to PDEs.
arXiv Detail & Related papers (2021-10-04T17:59:18Z) - A general sample complexity analysis of vanilla policy gradient [101.16957584135767]
Policy gradient (PG) is one of the most popular reinforcement learning (RL) problems.
"vanilla" theoretical understanding of PG trajectory is one of the most popular methods for solving RL problems.
arXiv Detail & Related papers (2021-07-23T19:38:17Z) - Nonparametric estimation of continuous DPPs with kernel methods [0.0]
Parametric and nonparametric inference methods have been proposed in the finite case, i.e. when the point patterns live in a finite ground set.
We show that a restricted version of this maximum likelihood (MLE) problem falls within the scope of a recent representer theorem for nonnegative functions in an RKHS.
We propose, analyze, and demonstrate a fixed point algorithm to solve this finite-dimensional problem.
arXiv Detail & Related papers (2021-06-27T11:57:14Z) - Stein Variational Model Predictive Control [130.60527864489168]
Decision making under uncertainty is critical to real-world, autonomous systems.
Model Predictive Control (MPC) methods have demonstrated favorable performance in practice, but remain limited when dealing with complex distributions.
We show that this framework leads to successful planning in challenging, non optimal control problems.
arXiv Detail & Related papers (2020-11-15T22:36:59Z) - Implicit Distributional Reinforcement Learning [61.166030238490634]
implicit distributional actor-critic (IDAC) built on two deep generator networks (DGNs)
Semi-implicit actor (SIA) powered by a flexible policy distribution.
We observe IDAC outperforms state-of-the-art algorithms on representative OpenAI Gym environments.
arXiv Detail & Related papers (2020-07-13T02:52:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.