Boosting Value Decomposition via Unit-Wise Attentive State
Representation for Cooperative Multi-Agent Reinforcement Learning
- URL: http://arxiv.org/abs/2305.07182v1
- Date: Fri, 12 May 2023 00:33:22 GMT
- Title: Boosting Value Decomposition via Unit-Wise Attentive State
Representation for Cooperative Multi-Agent Reinforcement Learning
- Authors: Qingpeng Zhao, Yuanyang Zhu, Zichuan Liu, Zhi Wang and Chunlin Chen
- Abstract summary: We propose a simple yet powerful method that alleviates partial observability and efficiently promotes coordination by the UNit-wise attentive State Representation (UNSR)
In UNSR, each agent learns a compact and disentangled unit-wise state representation outputted from transformer blocks, and produces its local action-value function.
Experimental results demonstrate that our method achieves superior performance and data efficiency compared to solid baselines on the Star IICraft micromanagement challenge.
- Score: 11.843811402154408
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In cooperative multi-agent reinforcement learning (MARL), the environmental
stochasticity and uncertainties will increase exponentially when the number of
agents increases, which puts hard pressure on how to come up with a compact
latent representation from partial observation for boosting value
decomposition. To tackle these issues, we propose a simple yet powerful method
that alleviates partial observability and efficiently promotes coordination by
introducing the UNit-wise attentive State Representation (UNSR). In UNSR, each
agent learns a compact and disentangled unit-wise state representation
outputted from transformer blocks, and produces its local action-value
function. The proposed UNSR is used to boost the value decomposition with a
multi-head attention mechanism for producing efficient credit assignment in the
mixing network, providing an efficient reasoning path between the individual
value function and joint value function. Experimental results demonstrate that
our method achieves superior performance and data efficiency compared to solid
baselines on the StarCraft II micromanagement challenge. Additional ablation
experiments also help identify the key factors contributing to the performance
of UNSR.
Related papers
- The challenge of redundancy on multi-agent value factorisation [12.63182277116319]
In the field of cooperative multi-agent reinforcement learning (MARL), the standard paradigm is the use of centralised training and decentralised execution.
We propose leveraging layerwise relevance propagation (LRP) to instead separate the learning of the joint value function and generation of local reward signals.
We find that although the performance of both baselines VDN and Qmix degrades with the number of redundant agents, RDN is unaffected.
arXiv Detail & Related papers (2023-03-28T20:41:12Z) - Learning Reward Machines in Cooperative Multi-Agent Tasks [75.79805204646428]
This paper presents a novel approach to Multi-Agent Reinforcement Learning (MARL)
It combines cooperative task decomposition with the learning of reward machines (RMs) encoding the structure of the sub-tasks.
The proposed method helps deal with the non-Markovian nature of the rewards in partially observable environments.
arXiv Detail & Related papers (2023-03-24T15:12:28Z) - Adaptive Value Decomposition with Greedy Marginal Contribution
Computation for Cooperative Multi-Agent Reinforcement Learning [48.41925886860991]
Real-world cooperation often requires intensive coordination among agents simultaneously.
Traditional methods that learn the value function as a monotonic mixing of per-agent utilities cannot solve the tasks with non-monotonic returns.
We propose a novel explicit credit assignment method to address the non-monotonic problem.
arXiv Detail & Related papers (2023-02-14T07:23:59Z) - PAC: Assisted Value Factorisation with Counterfactual Predictions in
Multi-Agent Reinforcement Learning [43.862956745961654]
Multi-agent reinforcement learning (MARL) has witnessed significant progress with the development of value function factorization methods.
In this paper, we show that in partially observable MARL problems, an agent's ordering over its own actions could impose concurrent constraints.
We propose PAC, a new framework leveraging information generated from Counterfactual Predictions of optimal joint action selection.
arXiv Detail & Related papers (2022-06-22T23:34:30Z) - RACA: Relation-Aware Credit Assignment for Ad-Hoc Cooperation in
Multi-Agent Deep Reinforcement Learning [55.55009081609396]
We propose a novel method, called Relation-Aware Credit Assignment (RACA), which achieves zero-shot generalization in ad-hoc cooperation scenarios.
RACA takes advantage of a graph-based encoder relation to encode the topological structure between agents.
Our method outperforms baseline methods on the StarCraftII micromanagement benchmark and ad-hoc cooperation scenarios.
arXiv Detail & Related papers (2022-06-02T03:39:27Z) - Value Functions Factorization with Latent State Information Sharing in
Decentralized Multi-Agent Policy Gradients [43.862956745961654]
LSF-SAC is a novel framework that features a variational inference-based information-sharing mechanism as extra state information.
We evaluate LSF-SAC on the StarCraft II micromanagement challenge and demonstrate that it outperforms several state-of-the-art methods in challenging collaborative tasks.
arXiv Detail & Related papers (2022-01-04T17:05:07Z) - Locality Matters: A Scalable Value Decomposition Approach for
Cooperative Multi-Agent Reinforcement Learning [52.7873574425376]
Cooperative multi-agent reinforcement learning (MARL) faces significant scalability issues due to state and action spaces that are exponentially large in the number of agents.
We propose a novel, value-based multi-agent algorithm called LOMAQ, which incorporates local rewards in the Training Decentralized Execution paradigm.
arXiv Detail & Related papers (2021-09-22T10:08:15Z) - Return-Based Contrastive Representation Learning for Reinforcement
Learning [126.7440353288838]
We propose a novel auxiliary task that forces the learnt representations to discriminate state-action pairs with different returns.
Our algorithm outperforms strong baselines on complex tasks in Atari games and DeepMind Control suite.
arXiv Detail & Related papers (2021-02-22T13:04:18Z) - Modeling the Interaction between Agents in Cooperative Multi-Agent
Reinforcement Learning [2.9360071145551068]
We propose a novel cooperative MARL algorithm named as interactive actor-critic(IAC)
IAC models the interaction of agents from perspectives of policy and value function.
We extend the value decomposition methods to continuous control tasks and evaluate IAC on benchmark tasks including classic control and multi-agent particle environments.
arXiv Detail & Related papers (2021-02-10T01:58:28Z) - Reward Machines for Cooperative Multi-Agent Reinforcement Learning [30.84689303706561]
In cooperative multi-agent reinforcement learning, a collection of agents learns to interact in a shared environment to achieve a common goal.
We propose the use of reward machines (RM) -- Mealy machines used as structured representations of reward functions -- to encode the team's task.
The proposed novel interpretation of RMs in the multi-agent setting explicitly encodes required teammate interdependencies, allowing the team-level task to be decomposed into sub-tasks for individual agents.
arXiv Detail & Related papers (2020-07-03T23:08:14Z) - QTRAN++: Improved Value Transformation for Cooperative Multi-Agent
Reinforcement Learning [70.382101956278]
QTRAN is a reinforcement learning algorithm capable of learning the largest class of joint-action value functions.
Despite its strong theoretical guarantee, it has shown poor empirical performance in complex environments.
We propose a substantially improved version, coined QTRAN++.
arXiv Detail & Related papers (2020-06-22T05:08:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.