On the Approximation of Cooperative Heterogeneous Multi-Agent
Reinforcement Learning (MARL) using Mean Field Control (MFC)
- URL: http://arxiv.org/abs/2109.04024v1
- Date: Thu, 9 Sep 2021 03:52:49 GMT
- Title: On the Approximation of Cooperative Heterogeneous Multi-Agent
Reinforcement Learning (MARL) using Mean Field Control (MFC)
- Authors: Washim Uddin Mondal, Mridul Agarwal, Vaneet Aggarwal, and Satish V.
Ukkusuri
- Abstract summary: Mean field control (MFC) is an effective way to mitigate the curse of dimensionality of cooperative multi-agent reinforcement learning problems.
This work considers a collection of $N_mathrmpop$ heterogeneous agents that can be segregated into $K$ classes.
- Score: 33.833747074900856
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mean field control (MFC) is an effective way to mitigate the curse of
dimensionality of cooperative multi-agent reinforcement learning (MARL)
problems. This work considers a collection of $N_{\mathrm{pop}}$ heterogeneous
agents that can be segregated into $K$ classes such that the $k$-th class
contains $N_k$ homogeneous agents. We aim to prove approximation guarantees of
the MARL problem for this heterogeneous system by its corresponding MFC
problem. We consider three scenarios where the reward and transition dynamics
of all agents are respectively taken to be functions of $(1)$ joint state and
action distributions across all classes, $(2)$ individual distributions of each
class, and $(3)$ marginal distributions of the entire population. We show that,
in these cases, the $K$-class MARL problem can be approximated by MFC with
errors given as
$e_1=\mathcal{O}(\frac{\sqrt{|\mathcal{X}||\mathcal{U}|}}{N_{\mathrm{pop}}}\sum_{k}\sqrt{N_k})$,
$e_2=\mathcal{O}(\sqrt{|\mathcal{X}||\mathcal{U}|}\sum_{k}\frac{1}{\sqrt{N_k}})$
and
$e_3=\mathcal{O}\left(\sqrt{|\mathcal{X}||\mathcal{U}|}\left[\frac{A}{N_{\mathrm{pop}}}\sum_{k\in[K]}\sqrt{N_k}+\frac{B}{\sqrt{N_{\mathrm{pop}}}}\right]\right)$,
respectively, where $A, B$ are some constants and $|\mathcal{X}|,|\mathcal{U}|$
are the sizes of state and action spaces of each agent. Finally, we design a
Natural Policy Gradient (NPG) based algorithm that, in the three cases stated
above, can converge to an optimal MARL policy within $\mathcal{O}(e_j)$ error
with a sample complexity of $\mathcal{O}(e_j^{-3})$, $j\in\{1,2,3\}$,
respectively.
Related papers
- The Communication Complexity of Approximating Matrix Rank [50.6867896228563]
We show that this problem has randomized communication complexity $Omega(frac1kcdot n2log|mathbbF|)$.
As an application, we obtain an $Omega(frac1kcdot n2log|mathbbF|)$ space lower bound for any streaming algorithm with $k$ passes.
arXiv Detail & Related papers (2024-10-26T06:21:42Z) - Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models.
In this work, we initiate the study of provably learning a multi-head attention layer from random examples.
We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z) - Mean-Field Control based Approximation of Multi-Agent Reinforcement
Learning in Presence of a Non-decomposable Shared Global State [37.63373979256335]
Mean Field Control (MFC) is a powerful approximation tool to solve large-scale Multi-Agent Reinforcement Learning (MARL) problems.
Here we demonstrate that even in a MARL setting where agents share a common global state, the MFC can still be applied as a good approximation tool.
arXiv Detail & Related papers (2023-01-13T18:55:58Z) - Additive estimates of the permanent using Gaussian fields [0.0]
We present a randomized algorithm for estimating the permanent of an $M times M$ real matrix $A$ up to an additive error.
We can estimate $mathrmperm(A)$ to an additive error of $epsilonbigg(sqrt32Mprod2M_i=1 C_iibigg)$ in time.
arXiv Detail & Related papers (2022-12-20T22:13:42Z) - Near Sample-Optimal Reduction-based Policy Learning for Average Reward
MDP [58.13930707612128]
This work considers the sample complexity of obtaining an $varepsilon$-optimal policy in an average reward Markov Decision Process (AMDP)
We prove an upper bound of $widetilde O(H varepsilon-3 ln frac1delta)$ samples per state-action pair, where $H := sp(h*)$ is the span of bias of any optimal policy, $varepsilon$ is the accuracy and $delta$ is the failure probability.
arXiv Detail & Related papers (2022-12-01T15:57:58Z) - Mean-Field Approximation of Cooperative Constrained Multi-Agent Reinforcement Learning (CMARL) [35.18639326270473]
We show that one can use the MFC approach to approximate the MARL problem even in the presence of constraints.
Also, we provide a Natural Policy Gradient based algorithm and prove that it can solve the constrained MARL problem within an error of $mathcalO(e)$ with a sample complexity of $mathcalO(e-6)$.
arXiv Detail & Related papers (2022-09-15T16:33:38Z) - On the Near-Optimality of Local Policies in Large Cooperative
Multi-Agent Reinforcement Learning [37.63373979256335]
We show that in a cooperative $N$-agent network, one can design locally executable policies for the agents.
We also devise an algorithm to explicitly construct a local policy.
arXiv Detail & Related papers (2022-09-07T23:15:08Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Can Mean Field Control (MFC) Approximate Cooperative Multi Agent
Reinforcement Learning (MARL) with Non-Uniform Interaction? [33.484960394599455]
Mean-Field Control (MFC) is a powerful tool to solve Multi-Agent Reinforcement (MARL) problems.
In this article, we relax the assumption of exchangeability and model the interaction between agents via an arbitrary doubly matrix.
We prove that, if the reward of each agent is an affine function of the mean-field seen by that agent, then one can approximate such a non-uniform MARL problem.
arXiv Detail & Related papers (2022-02-28T19:03:09Z) - Threshold Phenomena in Learning Halfspaces with Massart Noise [56.01192577666607]
We study the problem of PAC learning halfspaces on $mathbbRd$ with Massart noise under Gaussian marginals.
Our results qualitatively characterize the complexity of learning halfspaces in the Massart model.
arXiv Detail & Related papers (2021-08-19T16:16:48Z) - Near-Optimal SQ Lower Bounds for Agnostically Learning Halfspaces and
ReLUs under Gaussian Marginals [49.60752558064027]
We study the fundamental problems of agnostically learning halfspaces and ReLUs under Gaussian marginals.
Our lower bounds provide strong evidence that current upper bounds for these tasks are essentially best possible.
arXiv Detail & Related papers (2020-06-29T17:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.