A Fisher-Rao gradient flow for entropy-regularised Markov decision
processes in Polish spaces
- URL: http://arxiv.org/abs/2310.02951v1
- Date: Wed, 4 Oct 2023 16:41:36 GMT
- Title: A Fisher-Rao gradient flow for entropy-regularised Markov decision
processes in Polish spaces
- Authors: Bekzhan Kerimkulov, James-Michael Leahy, David Siska, Lukasz Szpruch,
Yufei Zhang
- Abstract summary: We study the global convergence of a Fisher-Rao policy gradient flow for infinite-horizon entropy-regularised Markov decision processes with Polish state and action space.
We establish the global well-posedness of the gradient flow and demonstrate its exponential convergence to the optimal policy.
- Score: 10.777806006475297
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the global convergence of a Fisher-Rao policy gradient flow for
infinite-horizon entropy-regularised Markov decision processes with Polish
state and action space. The flow is a continuous-time analogue of a policy
mirror descent method. We establish the global well-posedness of the gradient
flow and demonstrate its exponential convergence to the optimal policy.
Moreover, we prove the flow is stable with respect to gradient evaluation,
offering insights into the performance of a natural policy gradient flow with
log-linear policy parameterisation. To overcome challenges stemming from the
lack of the convexity of the objective function and the discontinuity arising
from the entropy regulariser, we leverage the performance difference lemma and
the duality relationship between the gradient and mirror descent flows.
Related papers
- Deterministic Policy Gradient Primal-Dual Methods for Continuous-Space Constrained MDPs [82.34567890576423]
We develop a deterministic policy gradient primal-dual method to find an optimal deterministic policy with non-asymptotic convergence.
We prove that the primal-dual iterates of D-PGPD converge at a sub-linear rate to an optimal regularized primal-dual pair.
To the best of our knowledge, this appears to be the first work that proposes a deterministic policy search method for continuous-space constrained MDPs.
arXiv Detail & Related papers (2024-08-19T14:11:04Z) - Essentially Sharp Estimates on the Entropy Regularization Error in Discrete Discounted Markov Decision Processes [4.714840786221651]
We show that for entropy-regularized natural policy gradient methods the overall error decays exponentially in the square root of the number of improving existing sublinear guarantees.
arXiv Detail & Related papers (2024-06-06T15:20:37Z) - On the Global Convergence of Policy Gradient in Average Reward Markov
Decision Processes [50.68789924454235]
We present the first finite time global convergence analysis of policy gradient in the context of average reward Markov decision processes (MDPs)
Our analysis shows that the policy gradient iterates converge to the optimal policy at a sublinear rate of $Oleft(frac1Tright),$ which translates to $Oleft(log(T)right)$ regret, where $T$ represents the number of iterations.
arXiv Detail & Related papers (2024-03-11T15:25:03Z) - High-probability sample complexities for policy evaluation with linear function approximation [88.87036653258977]
We investigate the sample complexities required to guarantee a predefined estimation error of the best linear coefficients for two widely-used policy evaluation algorithms.
We establish the first sample complexity bound with high-probability convergence guarantee that attains the optimal dependence on the tolerance level.
arXiv Detail & Related papers (2023-05-30T12:58:39Z) - Linear Convergence of Natural Policy Gradient Methods with Log-Linear
Policies [115.86431674214282]
We consider infinite-horizon discounted Markov decision processes and study the convergence rates of the natural policy gradient (NPG) and the Q-NPG methods with the log-linear policy class.
We show that both methods attain linear convergence rates and $mathcalO (1/epsilon2)$ sample complexities using a simple, non-adaptive geometrically increasing step size.
arXiv Detail & Related papers (2022-10-04T06:17:52Z) - Linear convergence of a policy gradient method for finite horizon
continuous time stochastic control problems [3.7971225066055765]
This paper proposes a provably convergent gradient method for general continuous space-time control problems.
We show that the algorithm converges linearly to the point of control, and is stable with respect to policy by steps.
arXiv Detail & Related papers (2022-03-22T14:17:53Z) - On the Convergence Rates of Policy Gradient Methods [9.74841674275568]
We consider geometrically discounted dominance problems with finite state sub spaces.
We show that with direct gradient pararization in a sample we can analyze the general complexity of a gradient.
arXiv Detail & Related papers (2022-01-19T07:03:37Z) - Convergence of policy gradient for entropy regularized MDPs with neural
network approximation in the mean-field regime [0.0]
We study the global convergence of policy gradient for infinite-horizon, continuous state and action space, entropy-regularized Markov decision processes (MDPs)
Our results rely on the careful analysis of non-linear Fokker--Planck--Kolmogorov equation.
arXiv Detail & Related papers (2022-01-18T20:17:16Z) - On the Sample Complexity and Metastability of Heavy-tailed Policy Search
in Continuous Control [47.71156648737803]
Reinforcement learning is a framework for interactive decision-making with incentives sequentially revealed across time without a system dynamics model.
We characterize a defined defined chain, identifying that policies associated with Levy Processes of a tail index yield to wider peaks.
arXiv Detail & Related papers (2021-06-15T20:12:44Z) - Statistically Efficient Off-Policy Policy Gradients [80.42316902296832]
We consider the statistically efficient estimation of policy gradients from off-policy data.
We propose a meta-algorithm that achieves the lower bound without any parametric assumptions.
We establish guarantees on the rate at which we approach a stationary point when we take steps in the direction of our new estimated policy gradient.
arXiv Detail & Related papers (2020-02-10T18:41:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.