On the convergence of policy gradient methods to Nash equilibria in
general stochastic games
- URL: http://arxiv.org/abs/2210.08857v1
- Date: Mon, 17 Oct 2022 08:51:59 GMT
- Title: On the convergence of policy gradient methods to Nash equilibria in
general stochastic games
- Authors: Angeliki Giannou and Kyriakos Lotidis and Panayotis Mertikopoulos and
Emmanouil-Vasileios Vlatakis-Gkaragkounis
- Abstract summary: We study the long-run behavior of policy gradient methods with respect to Nash equilibrium policies.
We show that policy gradient trajectories with gradient estimates provided by the REINFORCE algorithm achieve an $mathcalO (1/sqrtn)$ distance-squared convergence rate.
- Score: 33.786186304912
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning in stochastic games is a notoriously difficult problem because, in
addition to each other's strategic decisions, the players must also contend
with the fact that the game itself evolves over time, possibly in a very
complicated manner. Because of this, the convergence properties of popular
learning algorithms - like policy gradient and its variants - are poorly
understood, except in specific classes of games (such as potential or
two-player, zero-sum games). In view of this, we examine the long-run behavior
of policy gradient methods with respect to Nash equilibrium policies that are
second-order stationary (SOS) in a sense similar to the type of sufficiency
conditions used in optimization. Our first result is that SOS policies are
locally attracting with high probability, and we show that policy gradient
trajectories with gradient estimates provided by the REINFORCE algorithm
achieve an $\mathcal{O}(1/\sqrt{n})$ distance-squared convergence rate if the
method's step-size is chosen appropriately. Subsequently, specializing to the
class of deterministic Nash policies, we show that this rate can be improved
dramatically and, in fact, policy gradient methods converge within a finite
number of iterations in that case.
Related papers
- A Policy-Gradient Approach to Solving Imperfect-Information Games with Iterate Convergence [21.195897792629548]
Policy gradient methods have become a staple of any single-agent reinforcement learning toolbox.
We show for the first time that a policy gradient method leads to provable best-iterate convergence to a regularized Nash equilibrium in self-play.
arXiv Detail & Related papers (2024-08-01T17:54:01Z) - Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - The equivalence of dynamic and strategic stability under regularized
learning in games [33.74394172275373]
We examine the long-run behavior of regularized, no-regret learning in finite games.
We obtain an equivalence between strategic and dynamic stability.
We show that methods based on entropic regularization converge at a geometric rate.
arXiv Detail & Related papers (2023-11-04T14:07:33Z) - Last-Iterate Convergent Policy Gradient Primal-Dual Methods for
Constrained MDPs [107.28031292946774]
We study the problem of computing an optimal policy of an infinite-horizon discounted Markov decision process (constrained MDP)
We develop two single-time-scale policy-based primal-dual algorithms with non-asymptotic convergence of their policy iterates to an optimal constrained policy.
To the best of our knowledge, this work appears to be the first non-asymptotic policy last-iterate convergence result for single-time-scale algorithms in constrained MDPs.
arXiv Detail & Related papers (2023-06-20T17:27:31Z) - Policy Gradient for Rectangular Robust Markov Decision Processes [62.397882389472564]
We introduce robust policy gradient (RPG), a policy-based method that efficiently solves rectangular robust Markov decision processes (MDPs)
Our resulting RPG can be estimated from data with the same time complexity as its non-robust equivalent.
arXiv Detail & Related papers (2023-01-31T12:40:50Z) - On the Convergence Rates of Policy Gradient Methods [9.74841674275568]
We consider geometrically discounted dominance problems with finite state sub spaces.
We show that with direct gradient pararization in a sample we can analyze the general complexity of a gradient.
arXiv Detail & Related papers (2022-01-19T07:03:37Z) - Convergence and Optimality of Policy Gradient Methods in Weakly Smooth
Settings [17.437408088239142]
We establish explicit convergence rates of policy gradient methods without relying on opaque conditions.
We also characterize the sufficiency conditions for the ergodicity of near-linear MDPs.
We provide conditions and analysis for optimality of the converged policies.
arXiv Detail & Related papers (2021-10-30T06:31:01Z) - Independent Policy Gradient Methods for Competitive Reinforcement
Learning [62.91197073795261]
We obtain global, non-asymptotic convergence guarantees for independent learning algorithms in competitive reinforcement learning settings with two agents.
We show that if both players run policy gradient methods in tandem, their policies will converge to a min-max equilibrium of the game, as long as their learning rates follow a two-timescale rule.
arXiv Detail & Related papers (2021-01-11T23:20:42Z) - Policy Gradient for Continuing Tasks in Non-stationary Markov Decision
Processes [112.38662246621969]
Reinforcement learning considers the problem of finding policies that maximize an expected cumulative reward in a Markov decision process with unknown transition probabilities.
We compute unbiased navigation gradients of the value function which we use as ascent directions to update the policy.
A major drawback of policy gradient-type algorithms is that they are limited to episodic tasks unless stationarity assumptions are imposed.
arXiv Detail & Related papers (2020-10-16T15:15:42Z) - Provable Fictitious Play for General Mean-Field Games [111.44976345867005]
We propose a reinforcement learning algorithm for stationary mean-field games.
The goal is to learn a pair of mean-field state and stationary policy that constitutes the Nash equilibrium.
arXiv Detail & Related papers (2020-10-08T18:46:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.