Linear convergence of a policy gradient method for finite horizon
continuous time stochastic control problems
- URL: http://arxiv.org/abs/2203.11758v1
- Date: Tue, 22 Mar 2022 14:17:53 GMT
- Title: Linear convergence of a policy gradient method for finite horizon
continuous time stochastic control problems
- Authors: Christoph Reisinger, Wolfgang Stockinger, Yufei Zhang
- Abstract summary: This paper proposes a provably convergent gradient method for general continuous space-time control problems.
We show that the algorithm converges linearly to the point of control, and is stable with respect to policy by steps.
- Score: 3.7971225066055765
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite its popularity in the reinforcement learning community, a provably
convergent policy gradient method for general continuous space-time stochastic
control problems has been elusive. This paper closes the gap by proposing a
proximal gradient algorithm for feedback controls of finite-time horizon
stochastic control problems. The state dynamics are continuous time nonlinear
diffusions with controlled drift and possibly degenerate noise, and the
objectives are nonconvex in the state and nonsmooth in the control. We prove
under suitable conditions that the algorithm converges linearly to a stationary
point of the control problem, and is stable with respect to policy updates by
approximate gradient steps. The convergence result justifies the recent
reinforcement learning heuristics that adding entropy regularization to the
optimization objective accelerates the convergence of policy gradient methods.
The proof exploits careful regularity estimates of backward stochastic
differential equations.
Related papers
- Deterministic Policy Gradient Primal-Dual Methods for Continuous-Space Constrained MDPs [82.34567890576423]
We develop a deterministic policy gradient primal-dual method to find an optimal deterministic policy with non-asymptotic convergence.
We prove that the primal-dual iterates of D-PGPD converge at a sub-linear rate to an optimal regularized primal-dual pair.
To the best of our knowledge, this appears to be the first work that proposes a deterministic policy search method for continuous-space constrained MDPs.
arXiv Detail & Related papers (2024-08-19T14:11:04Z) - Full error analysis of policy gradient learning algorithms for exploratory linear quadratic mean-field control problem in continuous time with common noise [0.0]
We study policy gradient (PG) learning and first demonstrate convergence in a model-based setting.
We prove the global linear convergence and sample complexity of the PG algorithm with two-point gradient estimates in a model-free setting.
In this setting, the parameterized optimal policies are learned from samples of the states and population distribution.
arXiv Detail & Related papers (2024-08-05T14:11:51Z) - Entropy annealing for policy mirror descent in continuous time and space [2.8255028200738455]
We analyze a continuous-time policy mirror descent dynamics, which updates the policy based on the gradient of an entropy-regularized value function.
We prove that with a fixed entropy level, the dynamics converges exponentially to the optimal solution of the regularized problem.
arXiv Detail & Related papers (2024-05-30T17:02:18Z) - Last-Iterate Convergent Policy Gradient Primal-Dual Methods for
Constrained MDPs [107.28031292946774]
We study the problem of computing an optimal policy of an infinite-horizon discounted Markov decision process (constrained MDP)
We develop two single-time-scale policy-based primal-dual algorithms with non-asymptotic convergence of their policy iterates to an optimal constrained policy.
To the best of our knowledge, this work appears to be the first non-asymptotic policy last-iterate convergence result for single-time-scale algorithms in constrained MDPs.
arXiv Detail & Related papers (2023-06-20T17:27:31Z) - A Policy Gradient Framework for Stochastic Optimal Control Problems with
Global Convergence Guarantee [12.884132885360907]
We consider policy gradient methods for optimal control problem in continuous time.
We prove the global convergence of the gradient flow and establish a convergence rate under some regularity assumptions.
arXiv Detail & Related papers (2023-02-11T23:30:50Z) - Convergence of policy gradient methods for finite-horizon exploratory
linear-quadratic control problems [3.8661825615213012]
We study the global linear convergence of policy gradient (PG) methods for finite-horizon continuous-time exploratory linear-quadratic control (LQC) problems.
We propose a novel PG method with discretetime policies. The algorithm leverages the continuous-time analysis, and achieves a robust linear convergence across different action frequencies.
arXiv Detail & Related papers (2022-11-01T17:31:41Z) - Point Cloud Denoising via Momentum Ascent in Gradient Fields [72.93429911044903]
gradient-based method was proposed to estimate the gradient fields from the noisy point clouds using neural networks.
We develop a momentum gradient ascent method that leverages the information of previous iterations in determining the trajectories of the points.
Experiments demonstrate that the proposed method outperforms state-of-the-art approaches with a variety of point clouds, noise types, and noise levels.
arXiv Detail & Related papers (2022-02-21T10:21:40Z) - On the Convergence Rates of Policy Gradient Methods [9.74841674275568]
We consider geometrically discounted dominance problems with finite state sub spaces.
We show that with direct gradient pararization in a sample we can analyze the general complexity of a gradient.
arXiv Detail & Related papers (2022-01-19T07:03:37Z) - On the Convergence of Stochastic Extragradient for Bilinear Games with
Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence.
We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z) - On the Sample Complexity and Metastability of Heavy-tailed Policy Search
in Continuous Control [47.71156648737803]
Reinforcement learning is a framework for interactive decision-making with incentives sequentially revealed across time without a system dynamics model.
We characterize a defined defined chain, identifying that policies associated with Levy Processes of a tail index yield to wider peaks.
arXiv Detail & Related papers (2021-06-15T20:12:44Z) - Improper Learning with Gradient-based Policy Optimization [62.50997487685586]
We consider an improper reinforcement learning setting where the learner is given M base controllers for an unknown Markov Decision Process.
We propose a gradient-based approach that operates over a class of improper mixtures of the controllers.
arXiv Detail & Related papers (2021-02-16T14:53:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.