A General Framework for Learning Mean-Field Games
- URL: http://arxiv.org/abs/2003.06069v2
- Date: Sun, 10 Oct 2021 07:42:38 GMT
- Title: A General Framework for Learning Mean-Field Games
- Authors: Xin Guo, Anran Hu, Renyuan Xu and Junzi Zhang
- Abstract summary: This paper presents a general mean-field game (GMFG) framework for simultaneous learning and decision-making in games with a large population.
It then proposes value-based and policy-based reinforcement learning algorithms with smoothed policies.
Experiments on an equilibrium product pricing problem demonstrate that GMF-V-Q and GMF-P-TRPO, two specific instantiations of GMF-V and GMF-P, respectively, with Q-learning and TRPO, are both efficient and robust in the GMFG setting.
- Score: 10.483303456655058
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a general mean-field game (GMFG) framework for
simultaneous learning and decision-making in stochastic games with a large
population. It first establishes the existence of a unique Nash Equilibrium to
this GMFG, and demonstrates that naively combining reinforcement learning with
the fixed-point approach in classical MFGs yields unstable algorithms. It then
proposes value-based and policy-based reinforcement learning algorithms (GMF-V
and GMF-P, respectively) with smoothed policies, with analysis of their
convergence properties and computational complexities. Experiments on an
equilibrium product pricing problem demonstrate that GMF-V-Q and GMF-P-TRPO,
two specific instantiations of GMF-V and GMF-P, respectively, with Q-learning
and TRPO, are both efficient and robust in the GMFG setting. Moreover, their
performance is superior in convergence speed, accuracy, and stability when
compared with existing algorithms for multi-agent reinforcement learning in the
$N$-player setting.
Related papers
- uniINF: Best-of-Both-Worlds Algorithm for Parameter-Free Heavy-Tailed MABs [33.262918224598614]
We present a novel algorithm for the Heavy-Tailed Multi-Armed Bandits (HTMAB) problem, demonstrating robustness and adaptability.
Our novel algorithm uni enjoys the so-called Best-of-Both-Worlds (BoBW) property, performing optimally in both and adversarial environments.
To our knowledge, uniINF is the first parameter-free algorithm to achieve the BoBW property for the heavy-tailed MAB problem.
arXiv Detail & Related papers (2024-10-04T09:55:44Z) - Provably Efficient Information-Directed Sampling Algorithms for Multi-Agent Reinforcement Learning [50.92957910121088]
This work designs and analyzes a novel set of algorithms for multi-agent reinforcement learning (MARL) based on the principle of information-directed sampling (IDS)
For episodic two-player zero-sum MGs, we present three sample-efficient algorithms for learning Nash equilibrium.
We extend Reg-MAIDS to multi-player general-sum MGs and prove that it can learn either the Nash equilibrium or coarse correlated equilibrium in a sample efficient manner.
arXiv Detail & Related papers (2024-04-30T06:48:56Z) - Model-Based RL for Mean-Field Games is not Statistically Harder than Single-Agent RL [57.745700271150454]
We study the sample complexity of reinforcement learning in Mean-Field Games (MFGs) with model-based function approximation.
We introduce the Partial Model-Based Eluder Dimension (P-MBED), a more effective notion to characterize the model class complexity.
arXiv Detail & Related papers (2024-02-08T14:54:47Z) - Reinforcement Learning for SBM Graphon Games with Re-Sampling [4.6648272529750985]
We develop a novel learning framework based on a Graphon Game with Re-Sampling (GGR-S) model.
We analyze GGR-S dynamics and establish the convergence to dynamics of MP-MFG.
arXiv Detail & Related papers (2023-10-25T03:14:48Z) - Improving Sample Efficiency of Model-Free Algorithms for Zero-Sum Markov Games [66.2085181793014]
We show that a model-free stage-based Q-learning algorithm can enjoy the same optimality in the $H$ dependence as model-based algorithms.
Our algorithm features a key novel design of updating the reference value functions as the pair of optimistic and pessimistic value functions.
arXiv Detail & Related papers (2023-08-17T08:34:58Z) - On the Statistical Efficiency of Mean-Field Reinforcement Learning with General Function Approximation [20.66437196305357]
We study the fundamental statistical efficiency of Reinforcement Learning in Mean-Field Control (MFC) and Mean-Field Game (MFG) with general model-based function approximation.
We introduce a new concept called Mean-Field Model-Based Eluder Dimension (MF-MBED), which characterizes the inherent complexity of mean-field model classes.
arXiv Detail & Related papers (2023-05-18T20:00:04Z) - Beyond ADMM: A Unified Client-variance-reduced Adaptive Federated
Learning Framework [82.36466358313025]
We propose a primal-dual FL algorithm, termed FedVRA, that allows one to adaptively control the variance-reduction level and biasness of the global model.
Experiments based on (semi-supervised) image classification tasks demonstrate superiority of FedVRA over the existing schemes.
arXiv Detail & Related papers (2022-12-03T03:27:51Z) - Individual-Level Inverse Reinforcement Learning for Mean Field Games [16.79251229846642]
Mean Field IRL (MFIRL) is the first dedicated IRL framework for MFGs that can handle both cooperative and non-cooperative environments.
We develop a practical algorithm effective for MFGs with unknown dynamics.
arXiv Detail & Related papers (2022-02-13T20:35:01Z) - Kernelized Multiplicative Weights for 0/1-Polyhedral Games: Bridging the
Gap Between Learning in Extensive-Form and Normal-Form Games [76.21916750766277]
We show that the Optimistic Multiplicative Weights Update (OMWU) algorithm can be simulated on the normal-form equivalent of an EFG in linear time per iteration in the game tree size using a kernel trick.
Specifically, KOMWU gives the first algorithm that guarantees at the same time last-iterate convergence.
arXiv Detail & Related papers (2022-02-01T06:28:51Z) - Near-Optimal No-Regret Learning for Correlated Equilibria in
Multi-Player General-Sum Games [104.74734408204749]
We show that if all agents in a multi-player general-sum normal-form game employ Optimistic Multiplicative Weights Update (OMWU), the external regret of every player is $O(textrmpolylog(T))$ after $T$ repetitions of the game.
We extend their result from external regret to internal regret and swap regret, thereby establishing uncoupled learning dynamics that converge to an approximate correlated equilibrium.
arXiv Detail & Related papers (2021-11-11T01:19:53Z) - Reinforcement Learning for Mean Field Games with Strategic
Complementarities [10.281006908092932]
We introduce a natural refinement to the equilibrium concept that we call Trembling-Hand-Perfect MFE (T-MFE)
We propose a simple algorithm for computing T-MFE under a known model.
We also introduce a model-free and a model-based approach to learning T-MFE and provide sample complexities of both algorithms.
arXiv Detail & Related papers (2020-06-21T00:31:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.