Contrastive Identity-Aware Learning for Multi-Agent Value Decomposition
- URL: http://arxiv.org/abs/2211.12712v1
- Date: Wed, 23 Nov 2022 05:18:42 GMT
- Title: Contrastive Identity-Aware Learning for Multi-Agent Value Decomposition
- Authors: Shunyu Liu, Yihe Zhou, Jie Song, Tongya Zheng, Kaixuan Chen, Tongtian
Zhu, Zunlei Feng, Mingli Song
- Abstract summary: Value Decomposition (VD) aims to deduce the contributions of agents for decentralized policies in the presence of only global rewards.
One of the main challenges in VD is to promote diverse behaviors among agents, while existing methods directly encourage the diversity of learned agent networks.
We propose a novel Contrastive Identity-Aware learning (CIA) method, explicitly boosting the credit-level distinguishability of the VD network.
- Score: 31.877237996738252
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Value Decomposition (VD) aims to deduce the contributions of agents for
decentralized policies in the presence of only global rewards, and has recently
emerged as a powerful credit assignment paradigm for tackling cooperative
Multi-Agent Reinforcement Learning (MARL) problems. One of the main challenges
in VD is to promote diverse behaviors among agents, while existing methods
directly encourage the diversity of learned agent networks with various
strategies. However, we argue that these dedicated designs for agent networks
are still limited by the indistinguishable VD network, leading to homogeneous
agent behaviors and thus downgrading the cooperation capability. In this paper,
we propose a novel Contrastive Identity-Aware learning (CIA) method, explicitly
boosting the credit-level distinguishability of the VD network to break the
bottleneck of multi-agent diversity. Specifically, our approach leverages
contrastive learning to maximize the mutual information between the temporal
credits and identity representations of different agents, encouraging the full
expressiveness of credit assignment and further the emergence of
individualities. The algorithm implementation of the proposed CIA module is
simple yet effective that can be readily incorporated into various VD
architectures. Experiments on the SMAC benchmarks and across different VD
backbones demonstrate that the proposed method yields results superior to the
state-of-the-art counterparts. Our code is available at
https://github.com/liushunyu/CIA.
Related papers
- From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.
We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - Enhancing Heterogeneous Multi-Agent Cooperation in Decentralized MARL via GNN-driven Intrinsic Rewards [1.179778723980276]
Multi-agent Reinforcement Learning (MARL) is emerging as a key framework for sequential decision-making and control tasks.
The deployment of these systems in real-world scenarios often requires decentralized training, a diverse set of agents, and learning from infrequent environmental reward signals.
We propose the CoHet algorithm, which utilizes a novel Graph Neural Network (GNN) based intrinsic motivation to facilitate the learning of heterogeneous agent policies.
arXiv Detail & Related papers (2024-08-12T21:38:40Z) - Reframing the Relationship in Out-of-Distribution Detection [4.182518087792777]
We introduce a novel approach that integrates the agent paradigm into the Out-of-distribution (OOD) detection task.
Our proposed method, Concept Matching with Agent (CMA), employs neutral prompts as agents to augment the CLIP-based OOD detection process.
Our extensive experimental results showcase the superior performance of CMA over both zero-shot and training-required methods.
arXiv Detail & Related papers (2024-05-27T02:27:28Z) - DCIR: Dynamic Consistency Intrinsic Reward for Multi-Agent Reinforcement
Learning [84.22561239481901]
We propose a new approach that enables agents to learn whether their behaviors should be consistent with that of other agents.
We evaluate DCIR in multiple environments including Multi-agent Particle, Google Research Football and StarCraft II Micromanagement.
arXiv Detail & Related papers (2023-12-10T06:03:57Z) - A Variational Approach to Mutual Information-Based Coordination for
Multi-Agent Reinforcement Learning [17.893310647034188]
We propose a new mutual information framework for multi-agent reinforcement learning.
Applying policy to maximize the derived lower bound, we propose a practical algorithm named variational maximum mutual information multi-agent actor-critic.
arXiv Detail & Related papers (2023-03-01T12:21:30Z) - Learning From Good Trajectories in Offline Multi-Agent Reinforcement
Learning [98.07495732562654]
offline multi-agent reinforcement learning (MARL) aims to learn effective multi-agent policies from pre-collected datasets.
One agent learned by offline MARL often inherits this random policy, jeopardizing the performance of the entire team.
We propose a novel framework called Shared Individual Trajectories (SIT) to address this problem.
arXiv Detail & Related papers (2022-11-28T18:11:26Z) - RACA: Relation-Aware Credit Assignment for Ad-Hoc Cooperation in
Multi-Agent Deep Reinforcement Learning [55.55009081609396]
We propose a novel method, called Relation-Aware Credit Assignment (RACA), which achieves zero-shot generalization in ad-hoc cooperation scenarios.
RACA takes advantage of a graph-based encoder relation to encode the topological structure between agents.
Our method outperforms baseline methods on the StarCraftII micromanagement benchmark and ad-hoc cooperation scenarios.
arXiv Detail & Related papers (2022-06-02T03:39:27Z) - Policy Diagnosis via Measuring Role Diversity in Cooperative Multi-agent
RL [107.58821842920393]
We quantify the agent's behavior difference and build its relationship with the policy performance via bf Role Diversity
We find that the error bound in MARL can be decomposed into three parts that have a strong relation to the role diversity.
The decomposed factors can significantly impact policy optimization on three popular directions.
arXiv Detail & Related papers (2022-06-01T04:58:52Z) - LDSA: Learning Dynamic Subtask Assignment in Cooperative Multi-Agent
Reinforcement Learning [122.47938710284784]
We propose a novel framework for learning dynamic subtask assignment (LDSA) in cooperative MARL.
To reasonably assign agents to different subtasks, we propose an ability-based subtask selection strategy.
We show that LDSA learns reasonable and effective subtask assignment for better collaboration.
arXiv Detail & Related papers (2022-05-05T10:46:16Z) - Learning Cooperative Multi-Agent Policies with Partial Reward Decoupling [13.915157044948364]
One of the preeminent obstacles to scaling multi-agent reinforcement learning is assigning credit to individual agents' actions.
In this paper, we address this credit assignment problem with an approach that we call textitpartial reward decoupling (PRD)
PRD decomposes large cooperative multi-agent RL problems into decoupled subproblems involving subsets of agents, thereby simplifying credit assignment.
arXiv Detail & Related papers (2021-12-23T17:48:04Z) - Celebrating Diversity in Shared Multi-Agent Reinforcement Learning [20.901606233349177]
Deep multi-agent reinforcement learning has shown the promise to solve complex cooperative tasks.
In this paper, we aim to introduce diversity in both optimization and representation of shared multi-agent reinforcement learning.
Our method achieves state-of-the-art performance on Google Research Football and super hard StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2021-06-04T00:55:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.