Is Centralized Training with Decentralized Execution Framework
Centralized Enough for MARL?
- URL: http://arxiv.org/abs/2305.17352v1
- Date: Sat, 27 May 2023 03:15:24 GMT
- Title: Is Centralized Training with Decentralized Execution Framework
Centralized Enough for MARL?
- Authors: Yihe Zhou, Shunyu Liu, Yunpeng Qing, Kaixuan Chen, Tongya Zheng,
Yanhao Huang, Jie Song, Mingli Song
- Abstract summary: Training with Decentralized Execution is a popular framework for cooperative Multi-Agent Reinforcement Learning.
We introduce a novel Advising and Decentralized Pruning (CADP) framework for multi-agent reinforcement learning.
- Score: 27.037348104661497
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Centralized Training with Decentralized Execution (CTDE) has recently emerged
as a popular framework for cooperative Multi-Agent Reinforcement Learning
(MARL), where agents can use additional global state information to guide
training in a centralized way and make their own decisions only based on
decentralized local policies. Despite the encouraging results achieved, CTDE
makes an independence assumption on agent policies, which limits agents to
adopt global cooperative information from each other during centralized
training. Therefore, we argue that existing CTDE methods cannot fully utilize
global information for training, leading to an inefficient joint-policy
exploration and even suboptimal results. In this paper, we introduce a novel
Centralized Advising and Decentralized Pruning (CADP) framework for multi-agent
reinforcement learning, that not only enables an efficacious message exchange
among agents during training but also guarantees the independent policies for
execution. Firstly, CADP endows agents the explicit communication channel to
seek and take advices from different agents for more centralized training. To
further ensure the decentralized execution, we propose a smooth model pruning
mechanism to progressively constraint the agent communication into a closed one
without degradation in agent cooperation capability. Empirical evaluations on
StarCraft II micromanagement and Google Research Football benchmarks
demonstrate that the proposed framework achieves superior performance compared
with the state-of-the-art counterparts. Our code will be made publicly
available.
Related papers
- Tacit Learning with Adaptive Information Selection for Cooperative Multi-Agent Reinforcement Learning [13.918498667158119]
We introduce a novel cooperative MARL framework based on information selection and tacit learning.
We integrate gating and selection mechanisms, allowing agents to adaptively filter information based on environmental changes.
Experiments on popular MARL benchmarks show that our framework can be seamlessly integrated with state-of-the-art algorithms.
arXiv Detail & Related papers (2024-12-20T07:55:59Z) - An Introduction to Centralized Training for Decentralized Execution in Cooperative Multi-Agent Reinforcement Learning [14.873907857806358]
This text is an introduction to CTDE in cooperative MARL.
It is meant to explain the setting, basic concepts, and common methods.
arXiv Detail & Related papers (2024-09-04T19:54:40Z) - A First Introduction to Cooperative Multi-Agent Reinforcement Learning [14.873907857806358]
Multi-agent reinforcement learning (MARL) has exploded in popularity in recent years.
MARL approaches can be broadly categorized into three main types: centralized training and execution (CTE), centralized training for decentralized execution (CTDE), and decentralized training and execution (DTE)
This text is an introduction to cooperative MARL -- MARL in which all agents share a single, joint reward.
arXiv Detail & Related papers (2024-05-10T00:50:08Z) - Fully Decentralized Cooperative Multi-Agent Reinforcement Learning: A
Survey [48.77342627610471]
Cooperative multi-agent reinforcement learning is a powerful tool to solve many real-world cooperative tasks.
It is challenging to derive algorithms that can converge to the optimal joint policy in a fully decentralized setting.
arXiv Detail & Related papers (2024-01-10T05:07:42Z) - More Centralized Training, Still Decentralized Execution: Multi-Agent
Conditional Policy Factorization [21.10461189367695]
In cooperative multi-agent reinforcement learning (MARL), combining value decomposition with actor-critic enables agents learn policies.
Agents are commonly assumed to be independent of each other, even in centralized training.
We propose multi-agent conditional policy factorization (MACPF) which takes more centralized training but still enables decentralized execution.
arXiv Detail & Related papers (2022-09-26T13:29:22Z) - Scalable Multi-Agent Model-Based Reinforcement Learning [1.95804735329484]
We propose a new method called MAMBA which utilizes Model-Based Reinforcement Learning (MBRL) to further leverage centralized training in cooperative environments.
We argue that communication between agents is enough to sustain a world model for each agent during execution phase while imaginary rollouts can be used for training, removing the necessity to interact with the environment.
arXiv Detail & Related papers (2022-05-25T08:35:00Z) - Cooperative Multi-Agent Actor-Critic for Privacy-Preserving Load
Scheduling in a Residential Microgrid [71.17179010567123]
We propose a privacy-preserving multi-agent actor-critic framework where the decentralized actors are trained with distributed critics.
The proposed framework can preserve the privacy of the households while simultaneously learn the multi-agent credit assignment mechanism implicitly.
arXiv Detail & Related papers (2021-10-06T14:05:26Z) - Is Independent Learning All You Need in the StarCraft Multi-Agent
Challenge? [100.48692829396778]
Independent PPO (IPPO) is a form of independent learning in which each agent simply estimates its local value function.
IPPO's strong performance may be due to its robustness to some forms of environment non-stationarity.
arXiv Detail & Related papers (2020-11-18T20:29:59Z) - F2A2: Flexible Fully-decentralized Approximate Actor-critic for
Cooperative Multi-agent Reinforcement Learning [110.35516334788687]
Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications.
We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting.
Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
arXiv Detail & Related papers (2020-04-17T14:56:29Z) - Monotonic Value Function Factorisation for Deep Multi-Agent
Reinforcement Learning [55.20040781688844]
QMIX is a novel value-based method that can train decentralised policies in a centralised end-to-end fashion.
We propose the StarCraft Multi-Agent Challenge (SMAC) as a new benchmark for deep multi-agent reinforcement learning.
arXiv Detail & Related papers (2020-03-19T16:51:51Z) - Counterfactual Multi-Agent Policy Gradients [47.45255170608965]
We propose a new multi-agent actor-critic method called counterfactual multi-agent (COMA) policy gradients.
COMA uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies.
We evaluate COMA in the testbed of StarCraft unit micromanagement, using a decentralised variant with significant partial observability.
arXiv Detail & Related papers (2017-05-24T18:52:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.