Independent Policy Gradient Methods for Competitive Reinforcement
Learning
- URL: http://arxiv.org/abs/2101.04233v1
- Date: Mon, 11 Jan 2021 23:20:42 GMT
- Title: Independent Policy Gradient Methods for Competitive Reinforcement
Learning
- Authors: Constantinos Daskalakis, Dylan J. Foster, Noah Golowich
- Abstract summary: We obtain global, non-asymptotic convergence guarantees for independent learning algorithms in competitive reinforcement learning settings with two agents.
We show that if both players run policy gradient methods in tandem, their policies will converge to a min-max equilibrium of the game, as long as their learning rates follow a two-timescale rule.
- Score: 62.91197073795261
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We obtain global, non-asymptotic convergence guarantees for independent
learning algorithms in competitive reinforcement learning settings with two
agents (i.e., zero-sum stochastic games). We consider an episodic setting where
in each episode, each player independently selects a policy and observes only
their own actions and rewards, along with the state. We show that if both
players run policy gradient methods in tandem, their policies will converge to
a min-max equilibrium of the game, as long as their learning rates follow a
two-timescale rule (which is necessary). To the best of our knowledge, this
constitutes the first finite-sample convergence result for independent policy
gradient methods in competitive RL; prior work has largely focused on
centralized, coordinated procedures for equilibrium computation.
Related papers
- Federated Natural Policy Gradient Methods for Multi-task Reinforcement
Learning [49.65958529941962]
Federated reinforcement learning (RL) enables collaborative decision making of multiple distributed agents without sharing local data trajectories.
In this work, we consider a multi-task setting, in which each agent has its own private reward function corresponding to different tasks.
We learn a globally optimal policy that maximizes the sum of the discounted total rewards of all the agents in a decentralized manner.
arXiv Detail & Related papers (2023-11-01T00:15:18Z) - Finding mixed-strategy equilibria of continuous-action games without
gradients using randomized policy networks [83.28949556413717]
We study the problem of computing an approximate Nash equilibrium of continuous-action game without access to gradients.
We model players' strategies using artificial neural networks.
This paper is the first to solve general continuous-action games with unrestricted mixed strategies and without any gradient information.
arXiv Detail & Related papers (2022-11-29T05:16:41Z) - On the convergence of policy gradient methods to Nash equilibria in
general stochastic games [33.786186304912]
We study the long-run behavior of policy gradient methods with respect to Nash equilibrium policies.
We show that policy gradient trajectories with gradient estimates provided by the REINFORCE algorithm achieve an $mathcalO (1/sqrtn)$ distance-squared convergence rate.
arXiv Detail & Related papers (2022-10-17T08:51:59Z) - Independent Natural Policy Gradient Always Converges in Markov Potential
Games [18.43622733760659]
We study mixed cooperative/competitive games called Markov Potential Games (MPGs)
We prove that Independent Natural Policy Gradient always converges in MPGs using constant learning rates.
We complement our theoretical results with experiments that indicate that Natural Policy Gradient outperforms Policy Gradient in routing games and congestion games.
arXiv Detail & Related papers (2021-10-20T15:15:10Z) - Last-iterate Convergence of Decentralized Optimistic Gradient
Descent/Ascent in Infinite-horizon Competitive Markov Games [37.70703888365849]
We study infinite-horizon discounted two-player zero-sum Markov games.
We develop a decentralized algorithm that converges to the set of Nash equilibria under self-play.
arXiv Detail & Related papers (2021-02-08T21:45:56Z) - On Information Asymmetry in Competitive Multi-Agent Reinforcement
Learning: Convergence and Optimality [78.76529463321374]
We study the system of interacting non-cooperative two Q-learning agents.
We show that this information asymmetry can lead to a stable outcome of population learning.
arXiv Detail & Related papers (2020-10-21T11:19:53Z) - Provable Fictitious Play for General Mean-Field Games [111.44976345867005]
We propose a reinforcement learning algorithm for stationary mean-field games.
The goal is to learn a pair of mean-field state and stationary policy that constitutes the Nash equilibrium.
arXiv Detail & Related papers (2020-10-08T18:46:48Z) - Competitive Policy Optimization [137.17299766844596]
We propose a novel policy gradient approach that exploits the game-theoretic nature of competitive games to derive policy updates.
Motivated by the competitive gradient optimization method, we derive a bilinear approximation of the game objective.
We empirically investigate their behavior on a set of comprehensive, yet challenging, competitive games.
arXiv Detail & Related papers (2020-06-18T15:31:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.