AgentMixer: Multi-Agent Correlated Policy Factorization
- URL: http://arxiv.org/abs/2401.08728v1
- Date: Tue, 16 Jan 2024 15:32:41 GMT
- Title: AgentMixer: Multi-Agent Correlated Policy Factorization
- Authors: Zhiyuan Li, Wenshuai Zhao, Lijun Wu, Joni Pajarinen
- Abstract summary: We introduce textitstrategy modification to provide a mechanism for agents to correlate their policies.
We present a novel framework, AgentMixer, which constructs the joint fully observable policy as a non-linear combination of individual partially observable policies.
We show that AgentMixer converges to an $epsilon$-approximate Correlated Equilibrium.
- Score: 39.041191852287525
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Centralized training with decentralized execution (CTDE) is widely employed
to stabilize partially observable multi-agent reinforcement learning (MARL) by
utilizing a centralized value function during training. However, existing
methods typically assume that agents make decisions based on their local
observations independently, which may not lead to a correlated joint policy
with sufficient coordination. Inspired by the concept of correlated
equilibrium, we propose to introduce a \textit{strategy modification} to
provide a mechanism for agents to correlate their policies. Specifically, we
present a novel framework, AgentMixer, which constructs the joint fully
observable policy as a non-linear combination of individual partially
observable policies. To enable decentralized execution, one can derive
individual policies by imitating the joint policy. Unfortunately, such
imitation learning can lead to \textit{asymmetric learning failure} caused by
the mismatch between joint policy and individual policy information. To
mitigate this issue, we jointly train the joint policy and individual policies
and introduce \textit{Individual-Global-Consistency} to guarantee mode
consistency between the centralized and decentralized policies. We then
theoretically prove that AgentMixer converges to an $\epsilon$-approximate
Correlated Equilibrium. The strong experimental performance on three MARL
benchmarks demonstrates the effectiveness of our method.
Related papers
- Context-Aware Bayesian Network Actor-Critic Methods for Cooperative
Multi-Agent Reinforcement Learning [7.784991832712813]
We introduce a Bayesian network to inaugurate correlations between agents' action selections in their joint policy.
We develop practical algorithms to learn the context-aware Bayesian network policies.
Empirical results on a range of MARL benchmarks show the benefits of our approach.
arXiv Detail & Related papers (2023-06-02T21:22:27Z) - Decentralized Policy Optimization [21.59254848913971]
We propose textitdecentralized policy optimization (DPO), a decentralized actor-critic algorithm with monotonic improvement and convergence guarantee.
Empirically, we compare DPO with IPPO in a variety of cooperative multi-agent tasks, covering discrete and continuous action spaces, and fully and partially observable environments.
arXiv Detail & Related papers (2022-11-06T05:38:23Z) - More Centralized Training, Still Decentralized Execution: Multi-Agent
Conditional Policy Factorization [21.10461189367695]
In cooperative multi-agent reinforcement learning (MARL), combining value decomposition with actor-critic enables agents learn policies.
Agents are commonly assumed to be independent of each other, even in centralized training.
We propose multi-agent conditional policy factorization (MACPF) which takes more centralized training but still enables decentralized execution.
arXiv Detail & Related papers (2022-09-26T13:29:22Z) - Monotonic Improvement Guarantees under Non-stationarity for
Decentralized PPO [66.5384483339413]
We present a new monotonic improvement guarantee for optimizing decentralized policies in cooperative Multi-Agent Reinforcement Learning (MARL)
We show that a trust region constraint can be effectively enforced in a principled way by bounding independent ratios based on the number of agents in training.
arXiv Detail & Related papers (2022-01-31T20:39:48Z) - Dealing with Non-Stationarity in Multi-Agent Reinforcement Learning via
Trust Region Decomposition [52.06086375833474]
Non-stationarity is one thorny issue in multi-agent reinforcement learning.
We introduce a $delta$-stationarity measurement to explicitly model the stationarity of a policy sequence.
We propose a trust region decomposition network based on message passing to estimate the joint policy divergence.
arXiv Detail & Related papers (2021-02-21T14:46:50Z) - Implicit Distributional Reinforcement Learning [61.166030238490634]
implicit distributional actor-critic (IDAC) built on two deep generator networks (DGNs)
Semi-implicit actor (SIA) powered by a flexible policy distribution.
We observe IDAC outperforms state-of-the-art algorithms on representative OpenAI Gym environments.
arXiv Detail & Related papers (2020-07-13T02:52:18Z) - Calibration of Shared Equilibria in General Sum Partially Observable
Markov Games [15.572157454411533]
We consider a general sum partially observable Markov game where agents of different types share a single policy network.
This paper aims at i) formally understanding equilibria reached by such agents, and ii) matching emergent phenomena of such equilibria to real-world targets.
arXiv Detail & Related papers (2020-06-23T15:14:20Z) - F2A2: Flexible Fully-decentralized Approximate Actor-critic for
Cooperative Multi-agent Reinforcement Learning [110.35516334788687]
Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications.
We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting.
Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
arXiv Detail & Related papers (2020-04-17T14:56:29Z) - Monotonic Value Function Factorisation for Deep Multi-Agent
Reinforcement Learning [55.20040781688844]
QMIX is a novel value-based method that can train decentralised policies in a centralised end-to-end fashion.
We propose the StarCraft Multi-Agent Challenge (SMAC) as a new benchmark for deep multi-agent reinforcement learning.
arXiv Detail & Related papers (2020-03-19T16:51:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.