Mean-Field Reinforcement Learning without Synchrony
- URL: http://arxiv.org/abs/2602.18026v1
- Date: Fri, 20 Feb 2026 06:42:08 GMT
- Title: Mean-Field Reinforcement Learning without Synchrony
- Authors: Shan Yang,
- Abstract summary: Mean-field reinforcement learning scales to large populations by reducing each agent's dependence on others to a single summary statistic -- the mean action.<n>Existing MF-RL theory is built on the mean action and does not extend to $$.<n>We construct the Temporal Mean Field framework around the population distribution $$ from scratch, covering the full spectrum from fully synchronous to purely sequential decision-making.
- Score: 11.907264672363718
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mean-field reinforcement learning (MF-RL) scales multi-agent RL to large populations by reducing each agent's dependence on others to a single summary statistic -- the mean action. However, this reduction requires every agent to act at every time step; when some agents are idle, the mean action is simply undefined. Addressing asynchrony therefore requires a different summary statistic -- one that remains defined regardless of which agents act. The population distribution $μ\in Δ(\mathcal{O})$ -- the fraction of agents at each observation -- satisfies this requirement: its dimension is independent of $N$, and under exchangeability it fully determines each agent's reward and transition. Existing MF-RL theory, however, is built on the mean action and does not extend to $μ$. We therefore construct the Temporal Mean Field (TMF) framework around the population distribution $μ$ from scratch, covering the full spectrum from fully synchronous to purely sequential decision-making within a single theory. We prove existence and uniqueness of TMF equilibria, establish an $O(1/\sqrt{N})$ finite-population approximation bound that holds regardless of how many agents act per step, and prove convergence of a policy gradient algorithm (TMF-PG) to the unique equilibrium. Experiments on a resource selection game and a dynamic queueing game confirm that TMF-PG achieves near-identical performance whether one agent or all $N$ act per step, with approximation error decaying at the predicted $O(1/\sqrt{N})$ rate.
Related papers
- Descent-Guided Policy Gradient for Scalable Cooperative Multi-Agent Learning [14.185814237633958]
Descent-Guided Policy Gradient (DG-PG) is a framework that constructs noise-free per-agent guidance gradients.<n>We prove that DG-PG reduces gradient variance from $(N)$ to $mathcalO(1)$, preserves the equilibria of the cooperative game, and achieves agent-independent sample complexity.
arXiv Detail & Related papers (2026-02-23T17:45:08Z) - Graphon Mean-Field Subsampling for Cooperative Heterogeneous Multi-Agent Reinforcement Learning [19.98996237281175]
We introduce $texttGMFS$, a $textbfG$raphon $textbfM$ean-$textbfF$ield $textbfS$ubsampling framework for scalable cooperative MARL with heterogeneous agent interactions.<n>By subsampling $$ agents according to interaction strength, we approximate the graphon-weighted mean-field and learn a policy with sample complexity.<n>We verify our theory with numerical simulations in robotic coordination, showing that $textttGMFS$ achieves near-optimal performance
arXiv Detail & Related papers (2026-02-18T05:34:07Z) - Socially-Weighted Alignment: A Game-Theoretic Framework for Multi-Agent LLM Systems [17.658093330392052]
We propose a game-theoretic framework that modifies inference-time decision making by interpolating between an agent's private objective and an estimate of group welfare.<n>We show that SWA induces a critical threshold $*=(n-)/(n-1)$ above which agents no longer have marginal incentive to increase demand under overload.
arXiv Detail & Related papers (2026-02-16T05:17:58Z) - Information Fidelity in Tool-Using LLM Agents: A Martingale Analysis of the Model Context Protocol [69.11739400975445]
We introduce the first theoretical framework for analyzing error accumulation in Model Context Protocol (MCP) agents.<n>We show that cumulative distortion exhibits linear growth and high-probability deviations bounded by $O(sqrtT)$.<n>Key findings include: semantic weighting reduces distortion by 80%, and periodic re-grounding approximately every 9 steps suffices for error control.
arXiv Detail & Related papers (2026-02-10T21:08:53Z) - Phase Transition for Budgeted Multi-Agent Synergy [41.486076708302456]
Multi-agent systems can improve reliability, yet under a fixed inference budget they often help, saturate, or even collapse.<n>We develop a minimal and calibratable theory that predicts these regimes from three binding constraints of modern agent stacks.
arXiv Detail & Related papers (2026-01-24T05:32:50Z) - Independent RL for Cooperative-Competitive Agents: A Mean-Field Perspective [11.603515105957461]
We address Reinforcement Learning (RL) among agents that are grouped into teams such that there is cooperation within each team but general-sum competition across different teams.
arXiv Detail & Related papers (2024-03-17T21:11:55Z) - Refined Sample Complexity for Markov Games with Independent Linear Function Approximation [49.5660193419984]
Markov Games (MG) is an important model for Multi-Agent Reinforcement Learning (MARL)
This paper first refines the AVLPR framework by Wang et al. (2023), with an insight of designing pessimistic estimation of the sub-optimality gap.
We give the first algorithm that tackles the curse of multi-agents, attains the optimal $O(T-1/2) convergence rate, and avoids $textpoly(A_max)$ dependency simultaneously.
arXiv Detail & Related papers (2024-02-11T01:51:15Z) - Model-Based RL for Mean-Field Games is not Statistically Harder than Single-Agent RL [57.745700271150454]
We study the sample complexity of reinforcement learning in Mean-Field Games (MFGs) with model-based function approximation.
We introduce the Partial Model-Based Eluder Dimension (P-MBED), a more effective notion to characterize the model class complexity.
arXiv Detail & Related papers (2024-02-08T14:54:47Z) - Adaptive, Doubly Optimal No-Regret Learning in Strongly Monotone and Exp-Concave Games with Gradient Feedback [75.29048190099523]
Online gradient descent (OGD) is well known to be doubly optimal under strong convexity or monotonicity assumptions.
In this paper, we design a fully adaptive OGD algorithm, textsfAdaOGD, that does not require a priori knowledge of these parameters.
arXiv Detail & Related papers (2023-10-21T18:38:13Z) - Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative
Models [49.81937966106691]
We develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models.
In contrast to prior works, our theory is developed based on an elementary yet versatile non-asymptotic approach.
arXiv Detail & Related papers (2023-06-15T16:30:08Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - Model Free Reinforcement Learning Algorithm for Stationary Mean field
Equilibrium for Multiple Types of Agents [43.21120427632336]
We consider a multi-agent strategic interaction over an infinite horizon where agents can be of multiple types.
Each agent has a private state; the state evolves depending on the distribution of the state of the agents of different types and the action of the agent.
We show how such kind of interaction can model the cyber attacks among defenders and adversaries.
arXiv Detail & Related papers (2020-12-31T00:12:46Z) - VCG Mechanism Design with Unknown Agent Values under Stochastic Bandit
Feedback [104.06766271716774]
We study a multi-round welfare-maximising mechanism design problem in instances where agents do not know their values.
We first define three notions of regret for the welfare, the individual utilities of each agent and that of the mechanism.
Our framework also provides flexibility to control the pricing scheme so as to trade-off between the agent and seller regrets.
arXiv Detail & Related papers (2020-04-19T18:00:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.