Convergence of policy gradient methods for finite-horizon exploratory
linear-quadratic control problems
- URL: http://arxiv.org/abs/2211.00617v3
- Date: Fri, 1 Mar 2024 20:42:36 GMT
- Title: Convergence of policy gradient methods for finite-horizon exploratory
linear-quadratic control problems
- Authors: Michael Giegrich, Christoph Reisinger, Yufei Zhang
- Abstract summary: We study the global linear convergence of policy gradient (PG) methods for finite-horizon continuous-time exploratory linear-quadratic control (LQC) problems.
We propose a novel PG method with discretetime policies. The algorithm leverages the continuous-time analysis, and achieves a robust linear convergence across different action frequencies.
- Score: 3.8661825615213012
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the global linear convergence of policy gradient (PG) methods for
finite-horizon continuous-time exploratory linear-quadratic control (LQC)
problems. The setting includes stochastic LQC problems with indefinite costs
and allows additional entropy regularisers in the objective. We consider a
continuous-time Gaussian policy whose mean is linear in the state variable and
whose covariance is state-independent. Contrary to discrete-time problems, the
cost is noncoercive in the policy and not all descent directions lead to
bounded iterates. We propose geometry-aware gradient descents for the mean and
covariance of the policy using the Fisher geometry and the Bures-Wasserstein
geometry, respectively. The policy iterates are shown to satisfy an a-priori
bound, and converge globally to the optimal policy with a linear rate. We
further propose a novel PG method with discrete-time policies. The algorithm
leverages the continuous-time analysis, and achieves a robust linear
convergence across different action frequencies. A numerical experiment
confirms the convergence and robustness of the proposed algorithm.
Related papers
- Deterministic Policy Gradient Primal-Dual Methods for Continuous-Space Constrained MDPs [82.34567890576423]
We develop a deterministic policy gradient primal-dual method to find an optimal deterministic policy with non-asymptotic convergence.
We prove that the primal-dual iterates of D-PGPD converge at a sub-linear rate to an optimal regularized primal-dual pair.
To the best of our knowledge, this appears to be the first work that proposes a deterministic policy search method for continuous-space constrained MDPs.
arXiv Detail & Related papers (2024-08-19T14:11:04Z) - Full error analysis of policy gradient learning algorithms for exploratory linear quadratic mean-field control problem in continuous time with common noise [0.0]
We study policy gradient (PG) learning and first demonstrate convergence in a model-based setting.
We prove the global linear convergence and sample complexity of the PG algorithm with two-point gradient estimates in a model-free setting.
In this setting, the parameterized optimal policies are learned from samples of the states and population distribution.
arXiv Detail & Related papers (2024-08-05T14:11:51Z) - Last-Iterate Convergent Policy Gradient Primal-Dual Methods for
Constrained MDPs [107.28031292946774]
We study the problem of computing an optimal policy of an infinite-horizon discounted Markov decision process (constrained MDP)
We develop two single-time-scale policy-based primal-dual algorithms with non-asymptotic convergence of their policy iterates to an optimal constrained policy.
To the best of our knowledge, this work appears to be the first non-asymptotic policy last-iterate convergence result for single-time-scale algorithms in constrained MDPs.
arXiv Detail & Related papers (2023-06-20T17:27:31Z) - High-probability sample complexities for policy evaluation with linear function approximation [88.87036653258977]
We investigate the sample complexities required to guarantee a predefined estimation error of the best linear coefficients for two widely-used policy evaluation algorithms.
We establish the first sample complexity bound with high-probability convergence guarantee that attains the optimal dependence on the tolerance level.
arXiv Detail & Related papers (2023-05-30T12:58:39Z) - Geometry and convergence of natural policy gradient methods [0.0]
We study the convergence of several natural policy gradient (NPG) methods in infinite-horizon discounted Markov decision processes with regular policy parametrizations.
For a variety of NPGs and reward functions we show that the trajectories in state-action space are solutions of gradient flows with respect to Hessian geometries.
arXiv Detail & Related papers (2022-11-03T19:16:15Z) - Linear Convergence of Natural Policy Gradient Methods with Log-Linear
Policies [115.86431674214282]
We consider infinite-horizon discounted Markov decision processes and study the convergence rates of the natural policy gradient (NPG) and the Q-NPG methods with the log-linear policy class.
We show that both methods attain linear convergence rates and $mathcalO (1/epsilon2)$ sample complexities using a simple, non-adaptive geometrically increasing step size.
arXiv Detail & Related papers (2022-10-04T06:17:52Z) - Linear convergence of a policy gradient method for finite horizon
continuous time stochastic control problems [3.7971225066055765]
This paper proposes a provably convergent gradient method for general continuous space-time control problems.
We show that the algorithm converges linearly to the point of control, and is stable with respect to policy by steps.
arXiv Detail & Related papers (2022-03-22T14:17:53Z) - On the Convergence Rates of Policy Gradient Methods [9.74841674275568]
We consider geometrically discounted dominance problems with finite state sub spaces.
We show that with direct gradient pararization in a sample we can analyze the general complexity of a gradient.
arXiv Detail & Related papers (2022-01-19T07:03:37Z) - Softmax Policy Gradient Methods Can Take Exponential Time to Converge [60.98700344526674]
The softmax policy gradient (PG) method is arguably one of the de facto implementations of policy optimization in modern reinforcement learning.
We demonstrate that softmax PG methods can take exponential time -- in terms of $mathcalS|$ and $frac11-gamma$ -- to converge.
arXiv Detail & Related papers (2021-02-22T18:56:26Z) - Global Convergence of Policy Gradient for Linear-Quadratic Mean-Field
Control/Game in Continuous Time [109.06623773924737]
We study the policy gradient method for the linear-quadratic mean-field control and game.
We show that it converges to the optimal solution at a linear rate, which is verified by a synthetic simulation.
arXiv Detail & Related papers (2020-08-16T06:34:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.