Provable Fictitious Play for General Mean-Field Games
- URL: http://arxiv.org/abs/2010.04211v1
- Date: Thu, 8 Oct 2020 18:46:48 GMT
- Title: Provable Fictitious Play for General Mean-Field Games
- Authors: Qiaomin Xie, Zhuoran Yang, Zhaoran Wang, Andreea Minca
- Abstract summary: We propose a reinforcement learning algorithm for stationary mean-field games.
The goal is to learn a pair of mean-field state and stationary policy that constitutes the Nash equilibrium.
- Score: 111.44976345867005
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a reinforcement learning algorithm for stationary mean-field
games, where the goal is to learn a pair of mean-field state and stationary
policy that constitutes the Nash equilibrium. When viewing the mean-field state
and the policy as two players, we propose a fictitious play algorithm which
alternatively updates the mean-field state and the policy via gradient-descent
and proximal policy optimization, respectively. Our algorithm is in stark
contrast with previous literature which solves each single-agent reinforcement
learning problem induced by the iterates mean-field states to the optimum.
Furthermore, we prove that our fictitious play algorithm converges to the Nash
equilibrium at a sublinear rate. To the best of our knowledge, this seems the
first provably convergent single-loop reinforcement learning algorithm for
mean-field games based on iterative updates of both mean-field state and
policy.
Related papers
- COMAL: A Convergent Meta-Algorithm for Aligning LLMs with General Preferences [31.988100672680154]
We propose a meta-algorithm, Convergent Meta Alignment Algorithm (COMAL), for language model alignment with general preferences.
Our meta-algorithm is simple and can be integrated with many existing methods designed for RLHF and preference optimization.
arXiv Detail & Related papers (2024-10-30T17:13:02Z) - Regularization of the policy updates for stabilizing Mean Field Games [0.2348805691644085]
This work studies non-cooperative Multi-Agent Reinforcement Learning (MARL)
MARL where multiple agents interact in the same environment and whose goal is to maximize the individual returns.
We name our algorithm Mean Field Proximal Policy Optimization (MF-PPO), and we empirically show the effectiveness of our method in the OpenSpiel framework.
arXiv Detail & Related papers (2023-04-04T05:45:42Z) - Finding mixed-strategy equilibria of continuous-action games without
gradients using randomized policy networks [83.28949556413717]
We study the problem of computing an approximate Nash equilibrium of continuous-action game without access to gradients.
We model players' strategies using artificial neural networks.
This paper is the first to solve general continuous-action games with unrestricted mixed strategies and without any gradient information.
arXiv Detail & Related papers (2022-11-29T05:16:41Z) - On the convergence of policy gradient methods to Nash equilibria in
general stochastic games [33.786186304912]
We study the long-run behavior of policy gradient methods with respect to Nash equilibrium policies.
We show that policy gradient trajectories with gradient estimates provided by the REINFORCE algorithm achieve an $mathcalO (1/sqrtn)$ distance-squared convergence rate.
arXiv Detail & Related papers (2022-10-17T08:51:59Z) - Learning Correlated Equilibria in Mean-Field Games [62.14589406821103]
We develop the concepts of Mean-Field correlated and coarse-correlated equilibria.
We show that they can be efficiently learnt in emphall games, without requiring any additional assumption on the structure of the game.
arXiv Detail & Related papers (2022-08-22T08:31:46Z) - Policy Optimization for Markov Games: Unified Framework and Faster
Convergence [81.3266426402464]
We show that the state-wise average policy of this algorithm converges to an approximate Nash equilibrium (NE) of the game.
We extend this algorithm to multi-player general Markov Games and show a $mathcalwidetildeO(T-1/2)$ convergence rate to Correlated Equilibria (CCE)
arXiv Detail & Related papers (2022-06-06T14:23:13Z) - Provably Efficient Policy Gradient Methods for Two-Player Zero-Sum
Markov Games [95.70078702838654]
This paper studies natural extensions of Natural Policy Gradient algorithm for solving two-player zero-sum games.
We thoroughly characterize the algorithms' performance in terms of the number of samples, number of iterations, concentrability coefficients, and approximation error.
arXiv Detail & Related papers (2021-02-17T17:49:57Z) - Last-iterate Convergence of Decentralized Optimistic Gradient
Descent/Ascent in Infinite-horizon Competitive Markov Games [37.70703888365849]
We study infinite-horizon discounted two-player zero-sum Markov games.
We develop a decentralized algorithm that converges to the set of Nash equilibria under self-play.
arXiv Detail & Related papers (2021-02-08T21:45:56Z) - Independent Policy Gradient Methods for Competitive Reinforcement
Learning [62.91197073795261]
We obtain global, non-asymptotic convergence guarantees for independent learning algorithms in competitive reinforcement learning settings with two agents.
We show that if both players run policy gradient methods in tandem, their policies will converge to a min-max equilibrium of the game, as long as their learning rates follow a two-timescale rule.
arXiv Detail & Related papers (2021-01-11T23:20:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.