Related papers: Mean-Field Approximation of Cooperative Constrained Multi-Agent Reinforcement Learning (CMARL)

Mean-Field Approximation of Cooperative Constrained Multi-Agent Reinforcement Learning (CMARL)

URL: http://arxiv.org/abs/2209.07437v2
Date: Tue, 10 Sep 2024 04:45:49 GMT
Title: Mean-Field Approximation of Cooperative Constrained Multi-Agent Reinforcement Learning (CMARL)
Authors: Washim Uddin Mondal, Vaneet Aggarwal, Satish V. Ukkusuri,
Abstract summary: We show that one can use the MFC approach to approximate the MARL problem even in the presence of constraints. Also, we provide a Natural Policy Gradient based algorithm and prove that it can solve the constrained MARL problem within an error of $mathcalO(e)$ with a sample complexity of $mathcalO(e-6)$.
Score: 35.18639326270473
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Mean-Field Control (MFC) has recently been proven to be a scalable tool to approximately solve large-scale multi-agent reinforcement learning (MARL) problems. However, these studies are typically limited to unconstrained cumulative reward maximization framework. In this paper, we show that one can use the MFC approach to approximate the MARL problem even in the presence of constraints. Specifically, we prove that, an $N$-agent constrained MARL problem, with state, and action spaces of each individual agents being of sizes $|\mathcal{X}|$, and $|\mathcal{U}|$ respectively, can be approximated by an associated constrained MFC problem with an error, $e\triangleq \mathcal{O}\left([\sqrt{|\mathcal{X}|}+\sqrt{|\mathcal{U}|}]/\sqrt{N}\right)$. In a special case where the reward, cost, and state transition functions are independent of the action distribution of the population, we prove that the error can be improved to $e=\mathcal{O}(\sqrt{|\mathcal{X}|}/\sqrt{N})$. Also, we provide a Natural Policy Gradient based algorithm and prove that it can solve the constrained MARL problem within an error of $\mathcal{O}(e)$ with a sample complexity of $\mathcal{O}(e^{-6})$.

Related papers

Primal-Dual Sample Complexity Bounds for Constrained Markov Decision Processes with Multiple Constraints [0.0]
This paper addresses the challenge of solving Constrained Markov Decision Processes (CMDPs) with $d > 1$ constraints when the transition dynamics are unknown. We propose a model-based algorithm for infinite horizon CMDPs with multiple constraints, aiming to derive and prove sample complexity bounds for learning near-optimal policies.
arXiv Detail & Related papers (2025-03-09T20:10:35Z)
Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models. In this work, we initiate the study of provably learning a multi-head attention layer from random examples. We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z)
Mean-Field Control based Approximation of Multi-Agent Reinforcement Learning in Presence of a Non-decomposable Shared Global State [37.63373979256335]
Mean Field Control (MFC) is a powerful approximation tool to solve large-scale Multi-Agent Reinforcement Learning (MARL) problems. Here we demonstrate that even in a MARL setting where agents share a common global state, the MFC can still be applied as a good approximation tool.
arXiv Detail & Related papers (2023-01-13T18:55:58Z)
Near Sample-Optimal Reduction-based Policy Learning for Average Reward MDP [58.13930707612128]
This work considers the sample complexity of obtaining an $varepsilon$-optimal policy in an average reward Markov Decision Process (AMDP) We prove an upper bound of $widetilde O(H varepsilon-3 ln frac1delta)$ samples per state-action pair, where $H := sp(h*)$ is the span of bias of any optimal policy, $varepsilon$ is the accuracy and $delta$ is the failure probability.
arXiv Detail & Related papers (2022-12-01T15:57:58Z)
On the Near-Optimality of Local Policies in Large Cooperative Multi-Agent Reinforcement Learning [37.63373979256335]
We show that in a cooperative $N$-agent network, one can design locally executable policies for the agents. We also devise an algorithm to explicitly construct a local policy.
arXiv Detail & Related papers (2022-09-07T23:15:08Z)
Cryptographic Hardness of Learning Halfspaces with Massart Noise [59.8587499110224]
We study the complexity of PAC learning halfspaces in the presence of Massart noise. We show that no-time Massart halfspace learners can achieve error better than $Omega(eta)$, even if the optimal 0-1 error is small.
arXiv Detail & Related papers (2022-07-28T17:50:53Z)
Can Mean Field Control (MFC) Approximate Cooperative Multi Agent Reinforcement Learning (MARL) with Non-Uniform Interaction? [33.484960394599455]
Mean-Field Control (MFC) is a powerful tool to solve Multi-Agent Reinforcement (MARL) problems. In this article, we relax the assumption of exchangeability and model the interaction between agents via an arbitrary doubly matrix. We prove that, if the reward of each agent is an affine function of the mean-field seen by that agent, then one can approximate such a non-uniform MARL problem.
arXiv Detail & Related papers (2022-02-28T19:03:09Z)
On the Approximation of Cooperative Heterogeneous Multi-Agent Reinforcement Learning (MARL) using Mean Field Control (MFC) [33.833747074900856]
Mean field control (MFC) is an effective way to mitigate the curse of dimensionality of cooperative multi-agent reinforcement learning problems. This work considers a collection of $N_mathrmpop$ heterogeneous agents that can be segregated into $K$ classes.
arXiv Detail & Related papers (2021-09-09T03:52:49Z)
Threshold Phenomena in Learning Halfspaces with Massart Noise [56.01192577666607]
We study the problem of PAC learning halfspaces on $mathbbRd$ with Massart noise under Gaussian marginals. Our results qualitatively characterize the complexity of learning halfspaces in the Massart model.
arXiv Detail & Related papers (2021-08-19T16:16:48Z)
Hardness of Learning Halfspaces with Massart Noise [56.98280399449707]
We study the complexity of PAC learning halfspaces in the presence of Massart (bounded) noise. We show that there an exponential gap between the information-theoretically optimal error and the best error that can be achieved by a SQ algorithm.
arXiv Detail & Related papers (2020-12-17T16:43:11Z)
Mean-Field Controls with Q-learning for Cooperative MARL: Convergence and Complexity Analysis [7.800126150380472]
This paper builds the mathematical framework to approximate cooperative MARL by a mean-field control (MFC) approach. It proposes a model-free kernel-based Q-learning algorithm (MFC-K-Q), which is shown to have a linear convergence rate for the MFC problem, the first of its kind in the MARL literature.
arXiv Detail & Related papers (2020-02-10T23:30:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.