Can Mean Field Control (MFC) Approximate Cooperative Multi Agent
Reinforcement Learning (MARL) with Non-Uniform Interaction?
- URL: http://arxiv.org/abs/2203.00035v1
- Date: Mon, 28 Feb 2022 19:03:09 GMT
- Title: Can Mean Field Control (MFC) Approximate Cooperative Multi Agent
Reinforcement Learning (MARL) with Non-Uniform Interaction?
- Authors: Washim Uddin Mondal, Vaneet Aggarwal, and Satish V. Ukkusuri
- Abstract summary: Mean-Field Control (MFC) is a powerful tool to solve Multi-Agent Reinforcement (MARL) problems.
In this article, we relax the assumption of exchangeability and model the interaction between agents via an arbitrary doubly matrix.
We prove that, if the reward of each agent is an affine function of the mean-field seen by that agent, then one can approximate such a non-uniform MARL problem.
- Score: 33.484960394599455
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mean-Field Control (MFC) is a powerful tool to solve Multi-Agent
Reinforcement Learning (MARL) problems. Recent studies have shown that MFC can
well-approximate MARL when the population size is large and the agents are
exchangeable. Unfortunately, the presumption of exchangeability implies that
all agents uniformly interact with one another which is not true in many
practical scenarios. In this article, we relax the assumption of
exchangeability and model the interaction between agents via an arbitrary
doubly stochastic matrix. As a result, in our framework, the mean-field `seen'
by different agents are different. We prove that, if the reward of each agent
is an affine function of the mean-field seen by that agent, then one can
approximate such a non-uniform MARL problem via its associated MFC problem
within an error of $e=\mathcal{O}(\frac{1}{\sqrt{N}}[\sqrt{|\mathcal{X}|} +
\sqrt{|\mathcal{U}|}])$ where $N$ is the population size and $|\mathcal{X}|$,
$|\mathcal{U}|$ are the sizes of state and action spaces respectively. Finally,
we develop a Natural Policy Gradient (NPG) algorithm that can provide a
solution to the non-uniform MARL with an error
$\mathcal{O}(\max\{e,\epsilon\})$ and a sample complexity of
$\mathcal{O}(\epsilon^{-3})$ for any $\epsilon >0$.
Related papers
- Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs [56.237917407785545]
We consider the problem of learning an $varepsilon$-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators.
Key to our solution is a novel projection technique based on ideas from harmonic analysis.
Our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.
arXiv Detail & Related papers (2024-05-10T09:58:47Z) - Refined Sample Complexity for Markov Games with Independent Linear Function Approximation [49.5660193419984]
Markov Games (MG) is an important model for Multi-Agent Reinforcement Learning (MARL)
This paper first refines the AVLPR framework by Wang et al. (2023), with an insight of designing pessimistic estimation of the sub-optimality gap.
We give the first algorithm that tackles the curse of multi-agents, attains the optimal $O(T-1/2) convergence rate, and avoids $textpoly(A_max)$ dependency simultaneously.
arXiv Detail & Related papers (2024-02-11T01:51:15Z) - Statistical Learning under Heterogeneous Distribution Shift [71.8393170225794]
Ground-truth predictor is additive $mathbbE[mathbfz mid mathbfx,mathbfy] = f_star(mathbfx) +g_star(mathbfy)$.
arXiv Detail & Related papers (2023-02-27T16:34:21Z) - Mean-Field Control based Approximation of Multi-Agent Reinforcement
Learning in Presence of a Non-decomposable Shared Global State [37.63373979256335]
Mean Field Control (MFC) is a powerful approximation tool to solve large-scale Multi-Agent Reinforcement Learning (MARL) problems.
Here we demonstrate that even in a MARL setting where agents share a common global state, the MFC can still be applied as a good approximation tool.
arXiv Detail & Related papers (2023-01-13T18:55:58Z) - Near Sample-Optimal Reduction-based Policy Learning for Average Reward
MDP [58.13930707612128]
This work considers the sample complexity of obtaining an $varepsilon$-optimal policy in an average reward Markov Decision Process (AMDP)
We prove an upper bound of $widetilde O(H varepsilon-3 ln frac1delta)$ samples per state-action pair, where $H := sp(h*)$ is the span of bias of any optimal policy, $varepsilon$ is the accuracy and $delta$ is the failure probability.
arXiv Detail & Related papers (2022-12-01T15:57:58Z) - Mean-Field Approximation of Cooperative Constrained Multi-Agent Reinforcement Learning (CMARL) [35.18639326270473]
We show that one can use the MFC approach to approximate the MARL problem even in the presence of constraints.
Also, we provide a Natural Policy Gradient based algorithm and prove that it can solve the constrained MARL problem within an error of $mathcalO(e)$ with a sample complexity of $mathcalO(e-6)$.
arXiv Detail & Related papers (2022-09-15T16:33:38Z) - A Law of Robustness beyond Isoperimetry [84.33752026418045]
We prove a Lipschitzness lower bound $Omega(sqrtn/p)$ of robustness of interpolating neural network parameters on arbitrary distributions.
We then show the potential benefit of overparametrization for smooth data when $n=mathrmpoly(d)$.
We disprove the potential existence of an $O(1)$-Lipschitz robust interpolating function when $n=exp(omega(d))$.
arXiv Detail & Related papers (2022-02-23T16:10:23Z) - A Law of Iterated Logarithm for Multi-Agent Reinforcement Learning [3.655021726150368]
In Multi-Agent Reinforcement Learning (MARL), multiple agents interact with a common environment, as also with each other, for solving a shared problem in sequential decision-making.
We derive a novel law of iterated for a family of distributed nonlinear approximation schemes that is useful in MARL.
arXiv Detail & Related papers (2021-10-27T08:01:17Z) - On the Approximation of Cooperative Heterogeneous Multi-Agent
Reinforcement Learning (MARL) using Mean Field Control (MFC) [33.833747074900856]
Mean field control (MFC) is an effective way to mitigate the curse of dimensionality of cooperative multi-agent reinforcement learning problems.
This work considers a collection of $N_mathrmpop$ heterogeneous agents that can be segregated into $K$ classes.
arXiv Detail & Related papers (2021-09-09T03:52:49Z) - Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal
Sample Complexity [67.02490430380415]
We show that model-based MARL achieves a sample complexity of $tilde O(|S||B|(gamma)-3epsilon-2)$ for finding the Nash equilibrium (NE) value up to some $epsilon$ error.
We also show that such a sample bound is minimax-optimal (up to logarithmic factors) if the algorithm is reward-agnostic, where the algorithm queries state transition samples without reward knowledge.
arXiv Detail & Related papers (2020-07-15T03:25:24Z) - Mean-Field Controls with Q-learning for Cooperative MARL: Convergence
and Complexity Analysis [7.800126150380472]
This paper builds the mathematical framework to approximate cooperative MARL by a mean-field control (MFC) approach.
It proposes a model-free kernel-based Q-learning algorithm (MFC-K-Q), which is shown to have a linear convergence rate for the MFC problem, the first of its kind in the MARL literature.
arXiv Detail & Related papers (2020-02-10T23:30:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.