A Policy Gradient Framework for Stochastic Optimal Control Problems with
Global Convergence Guarantee
- URL: http://arxiv.org/abs/2302.05816v2
- Date: Sat, 22 Apr 2023 17:22:47 GMT
- Title: A Policy Gradient Framework for Stochastic Optimal Control Problems with
Global Convergence Guarantee
- Authors: Mo Zhou, Jianfeng Lu
- Abstract summary: We consider policy gradient methods for optimal control problem in continuous time.
We prove the global convergence of the gradient flow and establish a convergence rate under some regularity assumptions.
- Score: 12.884132885360907
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider policy gradient methods for stochastic optimal control problem in
continuous time. In particular, we analyze the gradient flow for the control,
viewed as a continuous time limit of the policy gradient method. We prove the
global convergence of the gradient flow and establish a convergence rate under
some regularity assumptions. The main novelty in the analysis is the notion of
local optimal control function, which is introduced to characterize the local
optimality of the iterate.
Related papers
- Deterministic Policy Gradient Primal-Dual Methods for Continuous-Space Constrained MDPs [82.34567890576423]
We develop a deterministic policy gradient primal-dual method to find an optimal deterministic policy with non-asymptotic convergence.
We prove that the primal-dual iterates of D-PGPD converge at a sub-linear rate to an optimal regularized primal-dual pair.
To the best of our knowledge, this appears to be the first work that proposes a deterministic policy search method for continuous-space constrained MDPs.
arXiv Detail & Related papers (2024-08-19T14:11:04Z) - Full error analysis of policy gradient learning algorithms for exploratory linear quadratic mean-field control problem in continuous time with common noise [0.0]
We study policy gradient (PG) learning and first demonstrate convergence in a model-based setting.
We prove the global linear convergence and sample complexity of the PG algorithm with two-point gradient estimates in a model-free setting.
In this setting, the parameterized optimal policies are learned from samples of the states and population distribution.
arXiv Detail & Related papers (2024-08-05T14:11:51Z) - Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning [62.81324245896717]
We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global last-ite convergence guarantees under (weak) gradient domination assumptions.
We numerically validate our algorithms on constrained control problems, and compare them with state-of-the-art baselines.
arXiv Detail & Related papers (2024-07-15T14:54:57Z) - Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - Local Optimization Achieves Global Optimality in Multi-Agent
Reinforcement Learning [139.53668999720605]
We present a multi-agent PPO algorithm in which the local policy of each agent is updated similarly to vanilla PPO.
We prove that with standard regularity conditions on the Markov game and problem-dependent quantities, our algorithm converges to the globally optimal policy at a sublinear rate.
arXiv Detail & Related papers (2023-05-08T16:20:03Z) - Linear convergence of a policy gradient method for finite horizon
continuous time stochastic control problems [3.7971225066055765]
This paper proposes a provably convergent gradient method for general continuous space-time control problems.
We show that the algorithm converges linearly to the point of control, and is stable with respect to policy by steps.
arXiv Detail & Related papers (2022-03-22T14:17:53Z) - Convergence and Optimality of Policy Gradient Methods in Weakly Smooth
Settings [17.437408088239142]
We establish explicit convergence rates of policy gradient methods without relying on opaque conditions.
We also characterize the sufficiency conditions for the ergodicity of near-linear MDPs.
We provide conditions and analysis for optimality of the converged policies.
arXiv Detail & Related papers (2021-10-30T06:31:01Z) - On the Convergence of Stochastic Extragradient for Bilinear Games with
Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence.
We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z) - Improper Learning with Gradient-based Policy Optimization [62.50997487685586]
We consider an improper reinforcement learning setting where the learner is given M base controllers for an unknown Markov Decision Process.
We propose a gradient-based approach that operates over a class of improper mixtures of the controllers.
arXiv Detail & Related papers (2021-02-16T14:53:55Z) - Policy Optimization for Markovian Jump Linear Quadratic Control:
Gradient-Based Methods and Global Convergence [3.3343656101775365]
We show that three types of policy optimization methods converge to the optimal state controller for MJLS at a gradient if at a controller is- and at a jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump jump
arXiv Detail & Related papers (2020-11-24T02:39:38Z) - Convergence Guarantees of Policy Optimization Methods for Markovian Jump
Linear Systems [3.3343656101775365]
We show that the Gauss-Newton method converges to the optimal state feedback controller for MJLS at a linear rate if at a controller which stabilizes the closed-loop dynamics in the mean sense.
We present an example to support our theory.
arXiv Detail & Related papers (2020-02-10T21:13:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.