Independent Policy Gradient Methods for Competitive Reinforcement
Learning
- URL: http://arxiv.org/abs/2101.04233v1
- Date: Mon, 11 Jan 2021 23:20:42 GMT
- Title: Independent Policy Gradient Methods for Competitive Reinforcement
Learning
- Authors: Constantinos Daskalakis, Dylan J. Foster, Noah Golowich
- Abstract summary: We obtain global, non-asymptotic convergence guarantees for independent learning algorithms in competitive reinforcement learning settings with two agents.
We show that if both players run policy gradient methods in tandem, their policies will converge to a min-max equilibrium of the game, as long as their learning rates follow a two-timescale rule.
- Score: 62.91197073795261
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We obtain global, non-asymptotic convergence guarantees for independent
learning algorithms in competitive reinforcement learning settings with two
agents (i.e., zero-sum stochastic games). We consider an episodic setting where
in each episode, each player independently selects a policy and observes only
their own actions and rewards, along with the state. We show that if both
players run policy gradient methods in tandem, their policies will converge to
a min-max equilibrium of the game, as long as their learning rates follow a
two-timescale rule (which is necessary). To the best of our knowledge, this
constitutes the first finite-sample convergence result for independent policy
gradient methods in competitive RL; prior work has largely focused on
centralized, coordinated procedures for equilibrium computation.
Related papers
- A Policy-Gradient Approach to Solving Imperfect-Information Games with Iterate Convergence [21.195897792629548]
Policy gradient methods have become a staple of any single-agent reinforcement learning toolbox.
We show for the first time that a policy gradient method leads to provable best-iterate convergence to a regularized Nash equilibrium in self-play.
arXiv Detail & Related papers (2024-08-01T17:54:01Z) - Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - Finding mixed-strategy equilibria of continuous-action games without
gradients using randomized policy networks [83.28949556413717]
We study the problem of computing an approximate Nash equilibrium of continuous-action game without access to gradients.
We model players' strategies using artificial neural networks.
This paper is the first to solve general continuous-action games with unrestricted mixed strategies and without any gradient information.
arXiv Detail & Related papers (2022-11-29T05:16:41Z) - On the convergence of policy gradient methods to Nash equilibria in
general stochastic games [33.786186304912]
We study the long-run behavior of policy gradient methods with respect to Nash equilibrium policies.
We show that policy gradient trajectories with gradient estimates provided by the REINFORCE algorithm achieve an $mathcalO (1/sqrtn)$ distance-squared convergence rate.
arXiv Detail & Related papers (2022-10-17T08:51:59Z) - Independent Natural Policy Gradient Always Converges in Markov Potential
Games [18.43622733760659]
We study mixed cooperative/competitive games called Markov Potential Games (MPGs)
We prove that Independent Natural Policy Gradient always converges in MPGs using constant learning rates.
We complement our theoretical results with experiments that indicate that Natural Policy Gradient outperforms Policy Gradient in routing games and congestion games.
arXiv Detail & Related papers (2021-10-20T15:15:10Z) - On Information Asymmetry in Competitive Multi-Agent Reinforcement
Learning: Convergence and Optimality [78.76529463321374]
We study the system of interacting non-cooperative two Q-learning agents.
We show that this information asymmetry can lead to a stable outcome of population learning.
arXiv Detail & Related papers (2020-10-21T11:19:53Z) - Provable Fictitious Play for General Mean-Field Games [111.44976345867005]
We propose a reinforcement learning algorithm for stationary mean-field games.
The goal is to learn a pair of mean-field state and stationary policy that constitutes the Nash equilibrium.
arXiv Detail & Related papers (2020-10-08T18:46:48Z) - Competitive Policy Optimization [137.17299766844596]
We propose a novel policy gradient approach that exploits the game-theoretic nature of competitive games to derive policy updates.
Motivated by the competitive gradient optimization method, we derive a bilinear approximation of the game objective.
We empirically investigate their behavior on a set of comprehensive, yet challenging, competitive games.
arXiv Detail & Related papers (2020-06-18T15:31:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.