Learning Approximate Nash Equilibria in Cooperative Multi-Agent Reinforcement Learning via Mean-Field Subsampling
- URL: http://arxiv.org/abs/2603.03759v1
- Date: Wed, 04 Mar 2026 06:14:24 GMT
- Title: Learning Approximate Nash Equilibria in Cooperative Multi-Agent Reinforcement Learning via Mean-Field Subsampling
- Authors: Emile Anand, Ishani Karmarkar,
- Abstract summary: We study a cooperative Markov game with a global agent and $n$ homogeneous local agents in a communication-constrained regime.<n>We prove that these approximate best-response dynamics converge to an $widetildeO (1/sqrtk)$-approximate Nash Equilibrium.
- Score: 3.396870608435494
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many large-scale platforms and networked control systems have a centralized decision maker interacting with a massive population of agents under strict observability constraints. Motivated by such applications, we study a cooperative Markov game with a global agent and $n$ homogeneous local agents in a communication-constrained regime, where the global agent only observes a subset of $k$ local agent states per time step. We propose an alternating learning framework $(\texttt{ALTERNATING-MARL})$, where the global agent performs subsampled mean-field $Q$-learning against a fixed local policy, and local agents update by optimizing in an induced MDP. We prove that these approximate best-response dynamics converge to an $\widetilde{O}(1/\sqrt{k})$-approximate Nash Equilibrium, while yielding a separation in the sample complexities between the joint state space and action space. Finally, we validate our results in numerical simulations for multi-robot control and federated optimization.
Related papers
- Graphon Mean-Field Subsampling for Cooperative Heterogeneous Multi-Agent Reinforcement Learning [19.98996237281175]
We introduce $texttGMFS$, a $textbfG$raphon $textbfM$ean-$textbfF$ield $textbfS$ubsampling framework for scalable cooperative MARL with heterogeneous agent interactions.<n>By subsampling $$ agents according to interaction strength, we approximate the graphon-weighted mean-field and learn a policy with sample complexity.<n>We verify our theory with numerical simulations in robotic coordination, showing that $textttGMFS$ achieves near-optimal performance
arXiv Detail & Related papers (2026-02-18T05:34:07Z) - Achieving Equilibrium under Utility Heterogeneity: An Agent-Attention Framework for Multi-Agent Multi-Objective Reinforcement Learning [30.138706163658597]
We propose an Agent-Attention Multi-Agent Multi-Objective Reinforcement Learning (AA-MAMORL) framework.<n>Our approach implicitly learns a joint belief over other agents' utility functions and their associated policies during training.<n>In execution, each agent independently selects actions based on local observations and its private utility function to approximate a BNE.
arXiv Detail & Related papers (2025-11-12T03:06:21Z) - Multi-Agent Sampling: Scaling Inference Compute for Data Synthesis with Tree Search-Based Agentic Collaboration [81.45763823762682]
This work aims to bridge the gap by investigating the problem of data synthesis through multi-agent sampling.<n>We introduce Tree Search-based Orchestrated Agents(TOA), where the workflow evolves iteratively during the sequential sampling process.<n>Our experiments on alignment, machine translation, and mathematical reasoning demonstrate that multi-agent sampling significantly outperforms single-agent sampling as inference compute scales.
arXiv Detail & Related papers (2024-12-22T15:16:44Z) - Mean-Field Sampling for Cooperative Multi-Agent Reinforcement Learning [8.400105595501158]
We propose a new $textttSUBPLE-MFQ$ ($textbfSubsample$-$textbfMean-$textbfF$ield-$textbfQ$-learning) and a decentralized randomized policy for a system with $n$ agents.<n>We prove that this learned policy converges to the optimal policy on the order of $tilde$O (1/sqrtk)$ as the number of subsampled agents $k$ increases.
arXiv Detail & Related papers (2024-12-01T03:45:17Z) - Self-Localized Collaborative Perception [49.86110931859302]
We propose$mathttCoBEVGlue$, a novel self-localized collaborative perception system.
$mathttCoBEVGlue$ is a novel spatial alignment module, which provides the relative poses between agents.
$mathttCoBEVGlue$ achieves state-of-the-art detection performance under arbitrary localization noises and attacks.
arXiv Detail & Related papers (2024-06-18T15:26:54Z) - Federated Stochastic Approximation under Markov Noise and Heterogeneity: Applications in Reinforcement Learning [24.567125948995834]
Federated reinforcement learning is a framework in which $N$ agents collaboratively learn a global model.
We show that by careful collaboration of the agents in solving this joint fixed point problem, we can find the global model $N$ times faster.
arXiv Detail & Related papers (2022-06-21T08:39:12Z) - Convergence Rates of Average-Reward Multi-agent Reinforcement Learning
via Randomized Linear Programming [41.30044824711509]
We focus on the case that the global reward is a sum of local rewards, the joint policy factorizes into agents' marginals, and full state observability.
We develop multi-agent extensions, whereby agents solve their local saddle point problems and then perform local weighted averaging.
We establish that the sample complexity to obtain near-globally optimal solutions matches tight dependencies on the cardinality of the state and action spaces.
arXiv Detail & Related papers (2021-10-22T03:48:41Z) - Multi-Agent MDP Homomorphic Networks [100.74260120972863]
In cooperative multi-agent systems, complex symmetries arise between different configurations of the agents and their local observations.
Existing work on symmetries in single agent reinforcement learning can only be generalized to the fully centralized setting.
This paper introduces Multi-Agent MDP Homomorphic Networks, a class of networks that allows distributed execution using only local information.
arXiv Detail & Related papers (2021-10-09T07:46:25Z) - Locality Matters: A Scalable Value Decomposition Approach for
Cooperative Multi-Agent Reinforcement Learning [52.7873574425376]
Cooperative multi-agent reinforcement learning (MARL) faces significant scalability issues due to state and action spaces that are exponentially large in the number of agents.
We propose a novel, value-based multi-agent algorithm called LOMAQ, which incorporates local rewards in the Training Decentralized Execution paradigm.
arXiv Detail & Related papers (2021-09-22T10:08:15Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - Distributed Q-Learning with State Tracking for Multi-agent Networked
Control [61.63442612938345]
This paper studies distributed Q-learning for Linear Quadratic Regulator (LQR) in a multi-agent network.
We devise a state tracking (ST) based Q-learning algorithm to design optimal controllers for agents.
arXiv Detail & Related papers (2020-12-22T22:03:49Z) - F2A2: Flexible Fully-decentralized Approximate Actor-critic for
Cooperative Multi-agent Reinforcement Learning [110.35516334788687]
Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications.
We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting.
Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
arXiv Detail & Related papers (2020-04-17T14:56:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.