A Single Online Agent Can Efficiently Learn Mean Field Games
- URL: http://arxiv.org/abs/2405.03718v2
- Date: Tue, 16 Jul 2024 06:03:23 GMT
- Title: A Single Online Agent Can Efficiently Learn Mean Field Games
- Authors: Chenyu Zhang, Xu Chen, Xuan Di,
- Abstract summary: Mean field games (MFGs) are a promising framework for modeling the behavior of large-population systems.
This paper introduces a novel online single-agent model-free learning scheme, which enables a single agent to learn MFNE using online samples.
- Score: 16.00164239349632
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Mean field games (MFGs) are a promising framework for modeling the behavior of large-population systems. However, solving MFGs can be challenging due to the coupling of forward population evolution and backward agent dynamics. Typically, obtaining mean field Nash equilibria (MFNE) involves an iterative approach where the forward and backward processes are solved alternately, known as fixed-point iteration (FPI). This method requires fully observed population propagation and agent dynamics over the entire spatial domain, which could be impractical in some real-world scenarios. To overcome this limitation, this paper introduces a novel online single-agent model-free learning scheme, which enables a single agent to learn MFNE using online samples, without prior knowledge of the state-action space, reward function, or transition dynamics. Specifically, the agent updates its policy through the value function (Q), while simultaneously evaluating the mean field state (M), using the same batch of observations. We develop two variants of this learning scheme: off-policy and on-policy QM iteration. We prove that they efficiently approximate FPI, and a sample complexity guarantee is provided. The efficacy of our methods is confirmed by numerical experiments.
Related papers
- MFC-EQ: Mean-Field Control with Envelope Q-Learning for Moving Decentralized Agents in Formation [1.770056709115081]
Moving Agents in Formation (MAiF) is a variant of Multi-Agent Path Finding.
MFC-EQ is a scalable and adaptable learning framework for this bi-objective multi-agent problem.
arXiv Detail & Related papers (2024-10-15T20:59:47Z) - Stochastic Semi-Gradient Descent for Learning Mean Field Games with Population-Aware Function Approximation [16.00164239349632]
Mean field games (MFGs) model the interactions within a large-population multi-agent system using the population distribution.
This paper considers an online learning method for MFGs, where an agent updates its policy and population estimates simultaneously and fully asynchronously.
We develop a linear function approximation (LFA) for both the value function and the population distribution, resulting in the first population-aware LFA for MFGs on continuous state-action space.
arXiv Detail & Related papers (2024-08-15T14:51:50Z) - The Power of Resets in Online Reinforcement Learning [73.64852266145387]
We explore the power of simulators through online reinforcement learning with local simulator access (or, local planning)
We show that MDPs with low coverability can be learned in a sample-efficient fashion with only $Qstar$-realizability.
We show that the notorious Exogenous Block MDP problem is tractable under local simulator access.
arXiv Detail & Related papers (2024-04-23T18:09:53Z) - Model-Based RL for Mean-Field Games is not Statistically Harder than Single-Agent RL [57.745700271150454]
We study the sample complexity of reinforcement learning in Mean-Field Games (MFGs) with model-based function approximation.
We introduce the Partial Model-Based Eluder Dimension (P-MBED), a more effective notion to characterize the model class complexity.
arXiv Detail & Related papers (2024-02-08T14:54:47Z) - Regularization of the policy updates for stabilizing Mean Field Games [0.2348805691644085]
This work studies non-cooperative Multi-Agent Reinforcement Learning (MARL)
MARL where multiple agents interact in the same environment and whose goal is to maximize the individual returns.
We name our algorithm Mean Field Proximal Policy Optimization (MF-PPO), and we empirically show the effectiveness of our method in the OpenSpiel framework.
arXiv Detail & Related papers (2023-04-04T05:45:42Z) - Beyond ADMM: A Unified Client-variance-reduced Adaptive Federated
Learning Framework [82.36466358313025]
We propose a primal-dual FL algorithm, termed FedVRA, that allows one to adaptively control the variance-reduction level and biasness of the global model.
Experiments based on (semi-supervised) image classification tasks demonstrate superiority of FedVRA over the existing schemes.
arXiv Detail & Related papers (2022-12-03T03:27:51Z) - Efficient Model-based Multi-agent Reinforcement Learning via Optimistic
Equilibrium Computation [93.52573037053449]
H-MARL (Hallucinated Multi-Agent Reinforcement Learning) learns successful equilibrium policies after a few interactions with the environment.
We demonstrate our approach experimentally on an autonomous driving simulation benchmark.
arXiv Detail & Related papers (2022-03-14T17:24:03Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - Adversarial Inverse Reinforcement Learning for Mean Field Games [17.392418397388823]
Mean field games (MFGs) provide a mathematically tractable framework for modelling large-scale multi-agent systems.
This paper proposes a novel framework, Mean-Field Adversarial IRL (MF-AIRL), which is capable of tackling uncertainties in demonstrations.
arXiv Detail & Related papers (2021-04-29T21:03:49Z) - Global Convergence of Policy Gradient for Linear-Quadratic Mean-Field
Control/Game in Continuous Time [109.06623773924737]
We study the policy gradient method for the linear-quadratic mean-field control and game.
We show that it converges to the optimal solution at a linear rate, which is verified by a synthetic simulation.
arXiv Detail & Related papers (2020-08-16T06:34:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.