Locality Matters: A Scalable Value Decomposition Approach for
Cooperative Multi-Agent Reinforcement Learning
- URL: http://arxiv.org/abs/2109.10632v1
- Date: Wed, 22 Sep 2021 10:08:15 GMT
- Title: Locality Matters: A Scalable Value Decomposition Approach for
Cooperative Multi-Agent Reinforcement Learning
- Authors: Roy Zohar, Shie Mannor, Guy Tennenholtz
- Abstract summary: Cooperative multi-agent reinforcement learning (MARL) faces significant scalability issues due to state and action spaces that are exponentially large in the number of agents.
We propose a novel, value-based multi-agent algorithm called LOMAQ, which incorporates local rewards in the Training Decentralized Execution paradigm.
- Score: 52.7873574425376
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cooperative multi-agent reinforcement learning (MARL) faces significant
scalability issues due to state and action spaces that are exponentially large
in the number of agents. As environments grow in size, effective credit
assignment becomes increasingly harder and often results in infeasible learning
times. Still, in many real-world settings, there exist simplified underlying
dynamics that can be leveraged for more scalable solutions. In this work, we
exploit such locality structures effectively whilst maintaining global
cooperation. We propose a novel, value-based multi-agent algorithm called
LOMAQ, which incorporates local rewards in the Centralized Training
Decentralized Execution paradigm. Additionally, we provide a direct reward
decomposition method for finding these local rewards when only a global signal
is provided. We test our method empirically, showing it scales well compared to
other methods, significantly improving performance and convergence speed.
Related papers
- United We Stand: Decentralized Multi-Agent Planning With Attrition [4.196094610996091]
Decentralized planning is a key element of cooperative multi-agent systems for information gathering tasks.
We propose Attritable MCTS, a decentralized algorithm capable of timely and efficient adaptation to changes in the set of active agents.
We show both theoretically and experimentally that A-MCTS enables efficient adaptation even under high failure rates.
arXiv Detail & Related papers (2024-07-11T07:55:50Z) - Learning Reward Machines in Cooperative Multi-Agent Tasks [75.79805204646428]
This paper presents a novel approach to Multi-Agent Reinforcement Learning (MARL)
It combines cooperative task decomposition with the learning of reward machines (RMs) encoding the structure of the sub-tasks.
The proposed method helps deal with the non-Markovian nature of the rewards in partially observable environments.
arXiv Detail & Related papers (2023-03-24T15:12:28Z) - Scalable Multi-Agent Model-Based Reinforcement Learning [1.95804735329484]
We propose a new method called MAMBA which utilizes Model-Based Reinforcement Learning (MBRL) to further leverage centralized training in cooperative environments.
We argue that communication between agents is enough to sustain a world model for each agent during execution phase while imaginary rollouts can be used for training, removing the necessity to interact with the environment.
arXiv Detail & Related papers (2022-05-25T08:35:00Z) - Convergence Rates of Average-Reward Multi-agent Reinforcement Learning
via Randomized Linear Programming [41.30044824711509]
We focus on the case that the global reward is a sum of local rewards, the joint policy factorizes into agents' marginals, and full state observability.
We develop multi-agent extensions, whereby agents solve their local saddle point problems and then perform local weighted averaging.
We establish that the sample complexity to obtain near-globally optimal solutions matches tight dependencies on the cardinality of the state and action spaces.
arXiv Detail & Related papers (2021-10-22T03:48:41Z) - Softmax with Regularization: Better Value Estimation in Multi-Agent
Reinforcement Learning [72.28520951105207]
Overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning.
We propose a novel regularization-based update scheme that penalizes large joint action-values deviating from a baseline.
We show that our method provides a consistent performance improvement on a set of challenging StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2021-03-22T14:18:39Z) - Forgetful Experience Replay in Hierarchical Reinforcement Learning from
Demonstrations [55.41644538483948]
In this paper, we propose a combination of approaches that allow the agent to use low-quality demonstrations in complex vision-based environments.
Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations.
The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
arXiv Detail & Related papers (2020-06-17T15:38:40Z) - Scalable Multi-Agent Reinforcement Learning for Networked Systems with
Average Reward [17.925681736096482]
It has long been recognized that multi-agent reinforcement learning (MARL) faces significant scalability issues.
In this paper, we identify a rich class of networked MARL problems where the model exhibits a local dependence structure that allows it to be solved in a scalable manner.
arXiv Detail & Related papers (2020-06-11T17:23:17Z) - F2A2: Flexible Fully-decentralized Approximate Actor-critic for
Cooperative Multi-agent Reinforcement Learning [110.35516334788687]
Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications.
We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting.
Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
arXiv Detail & Related papers (2020-04-17T14:56:29Z) - Monotonic Value Function Factorisation for Deep Multi-Agent
Reinforcement Learning [55.20040781688844]
QMIX is a novel value-based method that can train decentralised policies in a centralised end-to-end fashion.
We propose the StarCraft Multi-Agent Challenge (SMAC) as a new benchmark for deep multi-agent reinforcement learning.
arXiv Detail & Related papers (2020-03-19T16:51:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.