Related papers: More Centralized Training, Still Decentralized Execution: Multi-Agent Conditional Policy Factorization

More Centralized Training, Still Decentralized Execution: Multi-Agent Conditional Policy Factorization

URL: http://arxiv.org/abs/2209.12681v1
Date: Mon, 26 Sep 2022 13:29:22 GMT
Title: More Centralized Training, Still Decentralized Execution: Multi-Agent Conditional Policy Factorization
Authors: Jiangxing Wang, Deheng Ye, and Zongqing Lu
Abstract summary: In cooperative multi-agent reinforcement learning (MARL), combining value decomposition with actor-critic enables agents learn policies. Agents are commonly assumed to be independent of each other, even in centralized training. We propose multi-agent conditional policy factorization (MACPF) which takes more centralized training but still enables decentralized execution.
Score: 21.10461189367695
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In cooperative multi-agent reinforcement learning (MARL), combining value decomposition with actor-critic enables agents to learn stochastic policies, which are more suitable for the partially observable environment. Given the goal of learning local policies that enable decentralized execution, agents are commonly assumed to be independent of each other, even in centralized training. However, such an assumption may prohibit agents from learning the optimal joint policy. To address this problem, we explicitly take the dependency among agents into centralized training. Although this leads to the optimal joint policy, it may not be factorized for decentralized execution. Nevertheless, we theoretically show that from such a joint policy, we can always derive another joint policy that achieves the same optimality but can be factorized for decentralized execution. To this end, we propose multi-agent conditional policy factorization (MACPF), which takes more centralized training but still enables decentralized execution. We empirically verify MACPF in various cooperative MARL tasks and demonstrate that MACPF achieves better performance or faster convergence than baselines.

Related papers

Multi-Agent Guided Policy Optimization [36.853129816484845]
Training with Decentralized Execution (CTDE) has become the dominant paradigm in cooperative Multi-Agent Reinforcement Learning (MARL)<n>We propose Multi-Agent Guided Policy Optimization (MAGPO), a novel framework that better leverages centralized training by integrating centralized guidance with decentralized execution.
arXiv Detail & Related papers (2025-07-24T03:22:21Z)
AgentMixer: Multi-Agent Correlated Policy Factorization [39.041191852287525]
We introduce textitstrategy modification to provide a mechanism for agents to correlate their policies. We present a novel framework, AgentMixer, which constructs the joint fully observable policy as a non-linear combination of individual partially observable policies. We show that AgentMixer converges to an $epsilon$-approximate Correlated Equilibrium.
arXiv Detail & Related papers (2024-01-16T15:32:41Z)
Is Centralized Training with Decentralized Execution Framework Centralized Enough for MARL? [27.037348104661497]
Training with Decentralized Execution is a popular framework for cooperative Multi-Agent Reinforcement Learning. We introduce a novel Advising and Decentralized Pruning (CADP) framework for multi-agent reinforcement learning.
arXiv Detail & Related papers (2023-05-27T03:15:24Z)
Decentralized Policy Optimization [21.59254848913971]
We propose textitdecentralized policy optimization (DPO), a decentralized actor-critic algorithm with monotonic improvement and convergence guarantee. Empirically, we compare DPO with IPPO in a variety of cooperative multi-agent tasks, covering discrete and continuous action spaces, and fully and partially observable environments.
arXiv Detail & Related papers (2022-11-06T05:38:23Z)
Communication-Efficient Actor-Critic Methods for Homogeneous Markov Games [6.589813623221242]
Policy sharing is crucial to efficient learning in certain tasks yet lacks theoretical justification. We develop the first consensus-based decentralized actor-critic method. We also develop practical algorithms based on our decentralized actor-critic method to reduce the communication cost during training.
arXiv Detail & Related papers (2022-02-18T20:35:00Z)
Monotonic Improvement Guarantees under Non-stationarity for Decentralized PPO [66.5384483339413]
We present a new monotonic improvement guarantee for optimizing decentralized policies in cooperative Multi-Agent Reinforcement Learning (MARL) We show that a trust region constraint can be effectively enforced in a principled way by bounding independent ratios based on the number of agents in training.
arXiv Detail & Related papers (2022-01-31T20:39:48Z)
Dealing with Non-Stationarity in Multi-Agent Reinforcement Learning via Trust Region Decomposition [52.06086375833474]
Non-stationarity is one thorny issue in multi-agent reinforcement learning. We introduce a $delta$-stationarity measurement to explicitly model the stationarity of a policy sequence. We propose a trust region decomposition network based on message passing to estimate the joint policy divergence.
arXiv Detail & Related papers (2021-02-21T14:46:50Z)
Is Independent Learning All You Need in the StarCraft Multi-Agent Challenge? [100.48692829396778]
Independent PPO (IPPO) is a form of independent learning in which each agent simply estimates its local value function. IPPO's strong performance may be due to its robustness to some forms of environment non-stationarity.
arXiv Detail & Related papers (2020-11-18T20:29:59Z)
F2A2: Flexible Fully-decentralized Approximate Actor-critic for Cooperative Multi-agent Reinforcement Learning [110.35516334788687]
Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications. We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting. Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
arXiv Detail & Related papers (2020-04-17T14:56:29Z)
Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning [55.20040781688844]
QMIX is a novel value-based method that can train decentralised policies in a centralised end-to-end fashion. We propose the StarCraft Multi-Agent Challenge (SMAC) as a new benchmark for deep multi-agent reinforcement learning.
arXiv Detail & Related papers (2020-03-19T16:51:51Z)
FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC) It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces. We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.