Mean-Field Control based Approximation of Multi-Agent Reinforcement
Learning in Presence of a Non-decomposable Shared Global State
- URL: http://arxiv.org/abs/2301.06889v2
- Date: Fri, 26 May 2023 20:10:50 GMT
- Title: Mean-Field Control based Approximation of Multi-Agent Reinforcement
Learning in Presence of a Non-decomposable Shared Global State
- Authors: Washim Uddin Mondal, Vaneet Aggarwal, Satish V. Ukkusuri
- Abstract summary: Mean Field Control (MFC) is a powerful approximation tool to solve large-scale Multi-Agent Reinforcement Learning (MARL) problems.
Here we demonstrate that even in a MARL setting where agents share a common global state, the MFC can still be applied as a good approximation tool.
- Score: 37.63373979256335
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mean Field Control (MFC) is a powerful approximation tool to solve
large-scale Multi-Agent Reinforcement Learning (MARL) problems. However, the
success of MFC relies on the presumption that given the local states and
actions of all the agents, the next (local) states of the agents evolve
conditionally independent of each other. Here we demonstrate that even in a
MARL setting where agents share a common global state in addition to their
local states evolving conditionally independently (thus introducing a
correlation between the state transition processes of individual agents), the
MFC can still be applied as a good approximation tool. The global state is
assumed to be non-decomposable i.e., it cannot be expressed as a collection of
local states of the agents. We compute the approximation error as
$\mathcal{O}(e)$ where $e=\frac{1}{\sqrt{N}}\left[\sqrt{|\mathcal{X}|}
+\sqrt{|\mathcal{U}|}\right]$. The size of the agent population is denoted by
the term $N$, and $|\mathcal{X}|, |\mathcal{U}|$ respectively indicate the
sizes of (local) state and action spaces of individual agents. The
approximation error is found to be independent of the size of the shared global
state space. We further demonstrate that in a special case if the reward and
state transition functions are independent of the action distribution of the
population, then the error can be improved to
$e=\frac{\sqrt{|\mathcal{X}|}}{\sqrt{N}}$. Finally, we devise a Natural Policy
Gradient based algorithm that solves the MFC problem with
$\mathcal{O}(\epsilon^{-3})$ sample complexity and obtains a policy that is
within $\mathcal{O}(\max\{e,\epsilon\})$ error of the optimal MARL policy for
any $\epsilon>0$.
Related papers
- Improved Algorithm for Adversarial Linear Mixture MDPs with Bandit
Feedback and Unknown Transition [71.33787410075577]
We study reinforcement learning with linear function approximation, unknown transition, and adversarial losses.
We propose a new algorithm that attains an $widetildeO(dsqrtHS3K + sqrtHSAK)$ regret with high probability.
arXiv Detail & Related papers (2024-03-07T15:03:50Z) - Scalable Primal-Dual Actor-Critic Method for Safe Multi-Agent RL with
General Utilities [12.104551746465932]
We investigate safe multi-agent reinforcement learning, where agents seek to collectively maximize an aggregate sum of local objectives while satisfying their own safety constraints.
Our algorithm converges to a first-order stationary point (FOSP) at the rate of $mathcalOleft(T-2/3right)$.
In the sample-based setting, we demonstrate that, with high probability, our algorithm requires $widetildemathcalOleft(epsilon-3.5right)$ samples to achieve an $epsilon$-FOSP.
arXiv Detail & Related papers (2023-05-27T20:08:35Z) - Near Sample-Optimal Reduction-based Policy Learning for Average Reward
MDP [58.13930707612128]
This work considers the sample complexity of obtaining an $varepsilon$-optimal policy in an average reward Markov Decision Process (AMDP)
We prove an upper bound of $widetilde O(H varepsilon-3 ln frac1delta)$ samples per state-action pair, where $H := sp(h*)$ is the span of bias of any optimal policy, $varepsilon$ is the accuracy and $delta$ is the failure probability.
arXiv Detail & Related papers (2022-12-01T15:57:58Z) - Horizon-Free and Variance-Dependent Reinforcement Learning for Latent
Markov Decision Processes [62.90204655228324]
We study regret minimization for reinforcement learning (RL) in Latent Markov Decision Processes (LMDPs) with context in hindsight.
We design a novel model-based algorithmic framework which can be instantiated with both a model-optimistic and a value-optimistic solver.
arXiv Detail & Related papers (2022-10-20T21:32:01Z) - Mean-Field Approximation of Cooperative Constrained Multi-Agent Reinforcement Learning (CMARL) [35.18639326270473]
We show that one can use the MFC approach to approximate the MARL problem even in the presence of constraints.
Also, we provide a Natural Policy Gradient based algorithm and prove that it can solve the constrained MARL problem within an error of $mathcalO(e)$ with a sample complexity of $mathcalO(e-6)$.
arXiv Detail & Related papers (2022-09-15T16:33:38Z) - On the Near-Optimality of Local Policies in Large Cooperative
Multi-Agent Reinforcement Learning [37.63373979256335]
We show that in a cooperative $N$-agent network, one can design locally executable policies for the agents.
We also devise an algorithm to explicitly construct a local policy.
arXiv Detail & Related papers (2022-09-07T23:15:08Z) - Can Mean Field Control (MFC) Approximate Cooperative Multi Agent
Reinforcement Learning (MARL) with Non-Uniform Interaction? [33.484960394599455]
Mean-Field Control (MFC) is a powerful tool to solve Multi-Agent Reinforcement (MARL) problems.
In this article, we relax the assumption of exchangeability and model the interaction between agents via an arbitrary doubly matrix.
We prove that, if the reward of each agent is an affine function of the mean-field seen by that agent, then one can approximate such a non-uniform MARL problem.
arXiv Detail & Related papers (2022-02-28T19:03:09Z) - On the Approximation of Cooperative Heterogeneous Multi-Agent
Reinforcement Learning (MARL) using Mean Field Control (MFC) [33.833747074900856]
Mean field control (MFC) is an effective way to mitigate the curse of dimensionality of cooperative multi-agent reinforcement learning problems.
This work considers a collection of $N_mathrmpop$ heterogeneous agents that can be segregated into $K$ classes.
arXiv Detail & Related papers (2021-09-09T03:52:49Z) - Improved Sample Complexity for Incremental Autonomous Exploration in
MDPs [132.88757893161699]
We learn the set of $epsilon$-optimal goal-conditioned policies attaining all states that are incrementally reachable within $L$ steps.
DisCo is the first algorithm that can return an $epsilon/c_min$-optimal policy for any cost-sensitive shortest-path problem.
arXiv Detail & Related papers (2020-12-29T14:06:09Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.