Population-size-Aware Policy Optimization for Mean-Field Games
- URL: http://arxiv.org/abs/2302.03364v1
- Date: Tue, 7 Feb 2023 10:16:00 GMT
- Title: Population-size-Aware Policy Optimization for Mean-Field Games
- Authors: Pengdeng Li, Xinrun Wang, Shuxin Li, Hau Chan, Bo An
- Abstract summary: We study how the optimal policies of agents evolve with the number of agents (population size) in mean-field games.
We propose Population-size-Aware Policy Optimization (PAPO), which unifies two natural options (augmentation and hypernetwork) and significantly better performance.
PAPO consists of three components: i) the population-size encoding which transforms the original value of population size to an equivalent encoding to avoid training collapse, ii) a hypernetwork to generate a distinct policy for each game conditioned on the population size, andiii) the population size as an additional input to the generated policy.
- Score: 34.80183622480149
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we attempt to bridge the two fields of finite-agent and
infinite-agent games, by studying how the optimal policies of agents evolve
with the number of agents (population size) in mean-field games, an
agent-centric perspective in contrast to the existing works focusing typically
on the convergence of the empirical distribution of the population. To this
end, the premise is to obtain the optimal policies of a set of finite-agent
games with different population sizes. However, either deriving the closed-form
solution for each game is theoretically intractable, training a distinct policy
for each game is computationally intensive, or directly applying the policy
trained in a game to other games is sub-optimal. We address these challenges
through the Population-size-Aware Policy Optimization (PAPO). Our contributions
are three-fold. First, to efficiently generate efficient policies for games
with different population sizes, we propose PAPO, which unifies two natural
options (augmentation and hypernetwork) and achieves significantly better
performance. PAPO consists of three components: i) the population-size encoding
which transforms the original value of population size to an equivalent
encoding to avoid training collapse, ii) a hypernetwork to generate a distinct
policy for each game conditioned on the population size, and iii) the
population size as an additional input to the generated policy. Next, we
construct a multi-task-based training procedure to efficiently train the neural
networks of PAPO by sampling data from multiple games with different population
sizes. Finally, extensive experiments on multiple environments show the
significant superiority of PAPO over baselines, and the analysis of the
evolution of the generated policies further deepens our understanding of the
two fields of finite-agent and infinite-agent games.
Related papers
- Leading the Pack: N-player Opponent Shaping [52.682734939786464]
We extend Opponent Shaping (OS) methods to environments involving multiple co-players and multiple shaping agents.
We find that when playing with a large number of co-players, OS methods' relative performance reduces, suggesting that in the limit OS methods may not perform well.
arXiv Detail & Related papers (2023-12-19T20:01:42Z) - Fictitious Cross-Play: Learning Global Nash Equilibrium in Mixed
Cooperative-Competitive Games [14.979239870856535]
Self-play (SP) is a popular reinforcement learning framework for solving competitive games.
In this work, we develop a novel algorithm, Fictitious Cross-Play (FXP), which inherits the benefits from both frameworks.
arXiv Detail & Related papers (2023-10-05T07:19:33Z) - Learning Diverse Risk Preferences in Population-based Self-play [23.07952140353786]
Current self-play algorithms optimize the agent to maximize expected win-rates against its current or historical copies.
We introduce diversity from the perspective that agents could have diverse risk preferences in the face of uncertainty.
We show that our method achieves comparable or superior performance in competitive games.
arXiv Detail & Related papers (2023-05-19T06:56:02Z) - Local Optimization Achieves Global Optimality in Multi-Agent
Reinforcement Learning [139.53668999720605]
We present a multi-agent PPO algorithm in which the local policy of each agent is updated similarly to vanilla PPO.
We prove that with standard regularity conditions on the Markov game and problem-dependent quantities, our algorithm converges to the globally optimal policy at a sublinear rate.
arXiv Detail & Related papers (2023-05-08T16:20:03Z) - Regularization of the policy updates for stabilizing Mean Field Games [0.2348805691644085]
This work studies non-cooperative Multi-Agent Reinforcement Learning (MARL)
MARL where multiple agents interact in the same environment and whose goal is to maximize the individual returns.
We name our algorithm Mean Field Proximal Policy Optimization (MF-PPO), and we empirically show the effectiveness of our method in the OpenSpiel framework.
arXiv Detail & Related papers (2023-04-04T05:45:42Z) - Provably Efficient Fictitious Play Policy Optimization for Zero-Sum
Markov Games with Structured Transitions [145.54544979467872]
We propose and analyze new fictitious play policy optimization algorithms for zero-sum Markov games with structured but unknown transitions.
We prove tight $widetildemathcalO(sqrtK)$ regret bounds after $K$ episodes in a two-agent competitive game scenario.
Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of simultaneous policy optimization.
arXiv Detail & Related papers (2022-07-25T18:29:16Z) - Self-Play PSRO: Toward Optimal Populations in Two-Player Zero-Sum Games [69.5064797859053]
We introduce emphSelf-Play PSRO (SP-PSRO), a method that adds an approximately optimal policy to the population in each iteration.
SP-PSRO empirically tends to converge much faster than APSRO and in many games converge in just a few iterations.
arXiv Detail & Related papers (2022-07-13T22:55:51Z) - Discovering Diverse Multi-Agent Strategic Behavior via Reward
Randomization [42.33734089361143]
We propose a technique for discovering diverse strategic policies in complex multi-agent games.
We derive a new algorithm, Reward-Randomized Policy Gradient (RPG)
RPG is able to discover multiple distinctive human-interpretable strategies in challenging temporal trust dilemmas.
arXiv Detail & Related papers (2021-03-08T06:26:55Z) - Provable Fictitious Play for General Mean-Field Games [111.44976345867005]
We propose a reinforcement learning algorithm for stationary mean-field games.
The goal is to learn a pair of mean-field state and stationary policy that constitutes the Nash equilibrium.
arXiv Detail & Related papers (2020-10-08T18:46:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.