Convergence of policy gradient for entropy regularized MDPs with neural
network approximation in the mean-field regime
- URL: http://arxiv.org/abs/2201.07296v1
- Date: Tue, 18 Jan 2022 20:17:16 GMT
- Title: Convergence of policy gradient for entropy regularized MDPs with neural
network approximation in the mean-field regime
- Authors: Bekzhan Kerimkulov and James-Michael Leahy and David \v{S}i\v{s}ka and
Lukasz Szpruch
- Abstract summary: We study the global convergence of policy gradient for infinite-horizon, continuous state and action space, entropy-regularized Markov decision processes (MDPs)
Our results rely on the careful analysis of non-linear Fokker--Planck--Kolmogorov equation.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the global convergence of policy gradient for infinite-horizon,
continuous state and action space, entropy-regularized Markov decision
processes (MDPs). We consider a softmax policy with (one-hidden layer) neural
network approximation in a mean-field regime. Additional entropic
regularization in the associated mean-field probability measure is added, and
the corresponding gradient flow is studied in the 2-Wasserstein metric. We show
that the objective function is increasing along the gradient flow. Further, we
prove that if the regularization in terms of the mean-field measure is
sufficient, the gradient flow converges exponentially fast to the unique
stationary solution, which is the unique maximizer of the regularized MDP
objective. Lastly, we study the sensitivity of the value function along the
gradient flow with respect to regularization parameters and the initial
condition. Our results rely on the careful analysis of non-linear
Fokker--Planck--Kolmogorov equation and extend the pioneering work of Mei et
al. 2020 and Agarwal et al. 2020, which quantify the global convergence rate of
policy gradient for entropy-regularized MDPs in the tabular setting.
Related papers
- Deterministic Policy Gradient Primal-Dual Methods for Continuous-Space Constrained MDPs [82.34567890576423]
We develop a deterministic policy gradient primal-dual method to find an optimal deterministic policy with non-asymptotic convergence.
We prove that the primal-dual iterates of D-PGPD converge at a sub-linear rate to an optimal regularized primal-dual pair.
To the best of our knowledge, this appears to be the first work that proposes a deterministic policy search method for continuous-space constrained MDPs.
arXiv Detail & Related papers (2024-08-19T14:11:04Z) - A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - On the Global Convergence of Policy Gradient in Average Reward Markov
Decision Processes [50.68789924454235]
We present the first finite time global convergence analysis of policy gradient in the context of average reward Markov decision processes (MDPs)
Our analysis shows that the policy gradient iterates converge to the optimal policy at a sublinear rate of $Oleft(frac1Tright),$ which translates to $Oleft(log(T)right)$ regret, where $T$ represents the number of iterations.
arXiv Detail & Related papers (2024-03-11T15:25:03Z) - A Fisher-Rao gradient flow for entropy-regularised Markov decision
processes in Polish spaces [10.777806006475297]
We study the global convergence of a Fisher-Rao policy gradient flow for infinite-horizon entropy-regularised Markov decision processes with Polish state and action space.
We establish the global well-posedness of the gradient flow and demonstrate its exponential convergence to the optimal policy.
arXiv Detail & Related papers (2023-10-04T16:41:36Z) - Last-Iterate Convergent Policy Gradient Primal-Dual Methods for
Constrained MDPs [107.28031292946774]
We study the problem of computing an optimal policy of an infinite-horizon discounted Markov decision process (constrained MDP)
We develop two single-time-scale policy-based primal-dual algorithms with non-asymptotic convergence of their policy iterates to an optimal constrained policy.
To the best of our knowledge, this work appears to be the first non-asymptotic policy last-iterate convergence result for single-time-scale algorithms in constrained MDPs.
arXiv Detail & Related papers (2023-06-20T17:27:31Z) - Convergence of mean-field Langevin dynamics: Time and space
discretization, stochastic gradient, and variance reduction [49.66486092259376]
The mean-field Langevin dynamics (MFLD) is a nonlinear generalization of the Langevin dynamics that incorporates a distribution-dependent drift.
Recent works have shown that MFLD globally minimizes an entropy-regularized convex functional in the space of measures.
We provide a framework to prove a uniform-in-time propagation of chaos for MFLD that takes into account the errors due to finite-particle approximation, time-discretization, and gradient approximation.
arXiv Detail & Related papers (2023-06-12T16:28:11Z) - On the Convergence Rates of Policy Gradient Methods [9.74841674275568]
We consider geometrically discounted dominance problems with finite state sub spaces.
We show that with direct gradient pararization in a sample we can analyze the general complexity of a gradient.
arXiv Detail & Related papers (2022-01-19T07:03:37Z) - Beyond Exact Gradients: Convergence of Stochastic Soft-Max Policy Gradient Methods with Entropy Regularization [20.651913793555163]
We revisit the classical entropy regularized policy gradient methods with the soft-max policy parametrization.
We establish a global optimality convergence result and a sample complexity of $widetildemathcalO(frac1epsilon2)$ for the proposed algorithm.
arXiv Detail & Related papers (2021-10-19T17:21:09Z) - A Near-Optimal Gradient Flow for Learning Neural Energy-Based Models [93.24030378630175]
We propose a novel numerical scheme to optimize the gradient flows for learning energy-based models (EBMs)
We derive a second-order Wasserstein gradient flow of the global relative entropy from Fokker-Planck equation.
Compared with existing schemes, Wasserstein gradient flow is a smoother and near-optimal numerical scheme to approximate real data densities.
arXiv Detail & Related papers (2019-10-31T02:26:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.