Related papers: Mean-Field Control based Approximation of Multi-Agent Reinforcement Learning in Presence of a Non-decomposable Shared Global State

Mean-Field Control based Approximation of Multi-Agent Reinforcement Learning in Presence of a Non-decomposable Shared Global State

URL: http://arxiv.org/abs/2301.06889v2
Date: Fri, 26 May 2023 20:10:50 GMT
Title: Mean-Field Control based Approximation of Multi-Agent Reinforcement Learning in Presence of a Non-decomposable Shared Global State
Authors: Washim Uddin Mondal, Vaneet Aggarwal, Satish V. Ukkusuri
Abstract summary: Mean Field Control (MFC) is a powerful approximation tool to solve large-scale Multi-Agent Reinforcement Learning (MARL) problems. Here we demonstrate that even in a MARL setting where agents share a common global state, the MFC can still be applied as a good approximation tool.
Score: 37.63373979256335
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Mean Field Control (MFC) is a powerful approximation tool to solve large-scale Multi-Agent Reinforcement Learning (MARL) problems. However, the success of MFC relies on the presumption that given the local states and actions of all the agents, the next (local) states of the agents evolve conditionally independent of each other. Here we demonstrate that even in a MARL setting where agents share a common global state in addition to their local states evolving conditionally independently (thus introducing a correlation between the state transition processes of individual agents), the MFC can still be applied as a good approximation tool. The global state is assumed to be non-decomposable i.e., it cannot be expressed as a collection of local states of the agents. We compute the approximation error as $\mathcal{O}(e)$ where $e=\frac{1}{\sqrt{N}}\left[\sqrt{|\mathcal{X}|} +\sqrt{|\mathcal{U}|}\right]$. The size of the agent population is denoted by the term $N$, and $|\mathcal{X}|, |\mathcal{U}|$ respectively indicate the sizes of (local) state and action spaces of individual agents. The approximation error is found to be independent of the size of the shared global state space. We further demonstrate that in a special case if the reward and state transition functions are independent of the action distribution of the population, then the error can be improved to $e=\frac{\sqrt{|\mathcal{X}|}}{\sqrt{N}}$. Finally, we devise a Natural Policy Gradient based algorithm that solves the MFC problem with $\mathcal{O}(\epsilon^{-3})$ sample complexity and obtains a policy that is within $\mathcal{O}(\max\{e,\epsilon\})$ error of the optimal MARL policy for any $\epsilon>0$.

Related papers

Mean-Field Sampling for Cooperative Multi-Agent Reinforcement Learning [4.899818550820576]
We propose a new algorithm for multi-agent reinforcement learning. We show that this learned policy converges to the optimal policy on the order of $tildeO (1/sqrtk)$ as the number of subsampled agents increases.
arXiv Detail & Related papers (2024-12-01T03:45:17Z)
Improved Algorithm for Adversarial Linear Mixture MDPs with Bandit Feedback and Unknown Transition [71.33787410075577]
We study reinforcement learning with linear function approximation, unknown transition, and adversarial losses. We propose a new algorithm that attains an $widetildeO(dsqrtHS3K + sqrtHSAK)$ regret with high probability.
arXiv Detail & Related papers (2024-03-07T15:03:50Z)
Scalable Primal-Dual Actor-Critic Method for Safe Multi-Agent RL with General Utilities [12.104551746465932]
We investigate safe multi-agent reinforcement learning, where agents seek to collectively maximize an aggregate sum of local objectives while satisfying their own safety constraints. Our algorithm converges to a first-order stationary point (FOSP) at the rate of $mathcalOleft(T-2/3right)$. In the sample-based setting, we demonstrate that, with high probability, our algorithm requires $widetildemathcalOleft(epsilon-3.5right)$ samples to achieve an $epsilon$-FOSP.
arXiv Detail & Related papers (2023-05-27T20:08:35Z)
Near Sample-Optimal Reduction-based Policy Learning for Average Reward MDP [58.13930707612128]
This work considers the sample complexity of obtaining an $varepsilon$-optimal policy in an average reward Markov Decision Process (AMDP) We prove an upper bound of $widetilde O(H varepsilon-3 ln frac1delta)$ samples per state-action pair, where $H := sp(h*)$ is the span of bias of any optimal policy, $varepsilon$ is the accuracy and $delta$ is the failure probability.
arXiv Detail & Related papers (2022-12-01T15:57:58Z)
Horizon-Free and Variance-Dependent Reinforcement Learning for Latent Markov Decision Processes [62.90204655228324]
We study regret minimization for reinforcement learning (RL) in Latent Markov Decision Processes (LMDPs) with context in hindsight. We design a novel model-based algorithmic framework which can be instantiated with both a model-optimistic and a value-optimistic solver.
arXiv Detail & Related papers (2022-10-20T21:32:01Z)
Mean-Field Approximation of Cooperative Constrained Multi-Agent Reinforcement Learning (CMARL) [35.18639326270473]
We show that one can use the MFC approach to approximate the MARL problem even in the presence of constraints. Also, we provide a Natural Policy Gradient based algorithm and prove that it can solve the constrained MARL problem within an error of $mathcalO(e)$ with a sample complexity of $mathcalO(e-6)$.
arXiv Detail & Related papers (2022-09-15T16:33:38Z)
On the Near-Optimality of Local Policies in Large Cooperative Multi-Agent Reinforcement Learning [37.63373979256335]
We show that in a cooperative $N$-agent network, one can design locally executable policies for the agents. We also devise an algorithm to explicitly construct a local policy.
arXiv Detail & Related papers (2022-09-07T23:15:08Z)
Can Mean Field Control (MFC) Approximate Cooperative Multi Agent Reinforcement Learning (MARL) with Non-Uniform Interaction? [33.484960394599455]
Mean-Field Control (MFC) is a powerful tool to solve Multi-Agent Reinforcement (MARL) problems. In this article, we relax the assumption of exchangeability and model the interaction between agents via an arbitrary doubly matrix. We prove that, if the reward of each agent is an affine function of the mean-field seen by that agent, then one can approximate such a non-uniform MARL problem.
arXiv Detail & Related papers (2022-02-28T19:03:09Z)
On the Approximation of Cooperative Heterogeneous Multi-Agent Reinforcement Learning (MARL) using Mean Field Control (MFC) [33.833747074900856]
Mean field control (MFC) is an effective way to mitigate the curse of dimensionality of cooperative multi-agent reinforcement learning problems. This work considers a collection of $N_mathrmpop$ heterogeneous agents that can be segregated into $K$ classes.
arXiv Detail & Related papers (2021-09-09T03:52:49Z)
Improved Sample Complexity for Incremental Autonomous Exploration in MDPs [132.88757893161699]
We learn the set of $epsilon$-optimal goal-conditioned policies attaining all states that are incrementally reachable within $L$ steps. DisCo is the first algorithm that can return an $epsilon/c_min$-optimal policy for any cost-sensitive shortest-path problem.
arXiv Detail & Related papers (2020-12-29T14:06:09Z)
Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss. For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$. For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.