Value Functions Factorization with Latent State Information Sharing in
Decentralized Multi-Agent Policy Gradients
- URL: http://arxiv.org/abs/2201.01247v3
- Date: Thu, 8 Jun 2023 05:44:57 GMT
- Title: Value Functions Factorization with Latent State Information Sharing in
Decentralized Multi-Agent Policy Gradients
- Authors: Hanhan Zhou, Tian Lan, Vaneet Aggarwal
- Abstract summary: LSF-SAC is a novel framework that features a variational inference-based information-sharing mechanism as extra state information.
We evaluate LSF-SAC on the StarCraft II micromanagement challenge and demonstrate that it outperforms several state-of-the-art methods in challenging collaborative tasks.
- Score: 43.862956745961654
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Value function factorization via centralized training and decentralized
execution is promising for solving cooperative multi-agent reinforcement tasks.
One of the approaches in this area, QMIX, has become state-of-the-art and
achieved the best performance on the StarCraft II micromanagement benchmark.
However, the monotonic-mixing of per agent estimates in QMIX is known to
restrict the joint action Q-values it can represent, as well as the
insufficient global state information for single agent value function
estimation, often resulting in suboptimality. To this end, we present LSF-SAC,
a novel framework that features a variational inference-based
information-sharing mechanism as extra state information to assist individual
agents in the value function factorization. We demonstrate that such latent
individual state information sharing can significantly expand the power of
value function factorization, while fully decentralized execution can still be
maintained in LSF-SAC through a soft-actor-critic design. We evaluate LSF-SAC
on the StarCraft II micromanagement challenge and demonstrate that it
outperforms several state-of-the-art methods in challenging collaborative
tasks. We further set extensive ablation studies for locating the key factors
accounting for its performance improvements. We believe that this new insight
can lead to new local value estimation methods and variational deep learning
algorithms. A demo video and code of implementation can be found at
https://sites.google.com/view/sacmm.
Related papers
- QFree: A Universal Value Function Factorization for Multi-Agent
Reinforcement Learning [2.287186762346021]
We propose QFree, a universal value function factorization method for multi-agent reinforcement learning.
We show that QFree achieves the state-of-the-art performance in a general-purpose complex MARL benchmark environment.
arXiv Detail & Related papers (2023-11-01T08:07:16Z) - Boosting Value Decomposition via Unit-Wise Attentive State
Representation for Cooperative Multi-Agent Reinforcement Learning [11.843811402154408]
We propose a simple yet powerful method that alleviates partial observability and efficiently promotes coordination by the UNit-wise attentive State Representation (UNSR)
In UNSR, each agent learns a compact and disentangled unit-wise state representation outputted from transformer blocks, and produces its local action-value function.
Experimental results demonstrate that our method achieves superior performance and data efficiency compared to solid baselines on the Star IICraft micromanagement challenge.
arXiv Detail & Related papers (2023-05-12T00:33:22Z) - PAC: Assisted Value Factorisation with Counterfactual Predictions in
Multi-Agent Reinforcement Learning [43.862956745961654]
Multi-agent reinforcement learning (MARL) has witnessed significant progress with the development of value function factorization methods.
In this paper, we show that in partially observable MARL problems, an agent's ordering over its own actions could impose concurrent constraints.
We propose PAC, a new framework leveraging information generated from Counterfactual Predictions of optimal joint action selection.
arXiv Detail & Related papers (2022-06-22T23:34:30Z) - RACA: Relation-Aware Credit Assignment for Ad-Hoc Cooperation in
Multi-Agent Deep Reinforcement Learning [55.55009081609396]
We propose a novel method, called Relation-Aware Credit Assignment (RACA), which achieves zero-shot generalization in ad-hoc cooperation scenarios.
RACA takes advantage of a graph-based encoder relation to encode the topological structure between agents.
Our method outperforms baseline methods on the StarCraftII micromanagement benchmark and ad-hoc cooperation scenarios.
arXiv Detail & Related papers (2022-06-02T03:39:27Z) - Locality Matters: A Scalable Value Decomposition Approach for
Cooperative Multi-Agent Reinforcement Learning [52.7873574425376]
Cooperative multi-agent reinforcement learning (MARL) faces significant scalability issues due to state and action spaces that are exponentially large in the number of agents.
We propose a novel, value-based multi-agent algorithm called LOMAQ, which incorporates local rewards in the Training Decentralized Execution paradigm.
arXiv Detail & Related papers (2021-09-22T10:08:15Z) - Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning [34.856522993714535]
We propose Shapley Counterfactual Credit Assignment, a novel method for explicit credit assignment which accounts for the coalition of agents.
Our method outperforms existing cooperative MARL algorithms significantly and achieves the state-of-the-art, with especially large margins on tasks with more severe difficulties.
arXiv Detail & Related papers (2021-06-01T07:38:34Z) - Dif-MAML: Decentralized Multi-Agent Meta-Learning [54.39661018886268]
We propose a cooperative multi-agent meta-learning algorithm, referred to as MAML or Dif-MAML.
We show that the proposed strategy allows a collection of agents to attain agreement at a linear rate and to converge to a stationary point of the aggregate MAML.
Simulation results illustrate the theoretical findings and the superior performance relative to the traditional non-cooperative setting.
arXiv Detail & Related papers (2020-10-06T16:51:09Z) - Monotonic Value Function Factorisation for Deep Multi-Agent
Reinforcement Learning [55.20040781688844]
QMIX is a novel value-based method that can train decentralised policies in a centralised end-to-end fashion.
We propose the StarCraft Multi-Agent Challenge (SMAC) as a new benchmark for deep multi-agent reinforcement learning.
arXiv Detail & Related papers (2020-03-19T16:51:51Z) - FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC)
It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.