A Variational Approach to Mutual Information-Based Coordination for
Multi-Agent Reinforcement Learning
- URL: http://arxiv.org/abs/2303.00451v1
- Date: Wed, 1 Mar 2023 12:21:30 GMT
- Title: A Variational Approach to Mutual Information-Based Coordination for
Multi-Agent Reinforcement Learning
- Authors: Woojun Kim, Whiyoung Jung, Myungsik Cho, Youngchul Sung
- Abstract summary: We propose a new mutual information framework for multi-agent reinforcement learning.
Applying policy to maximize the derived lower bound, we propose a practical algorithm named variational maximum mutual information multi-agent actor-critic.
- Score: 17.893310647034188
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we propose a new mutual information framework for multi-agent
reinforcement learning to enable multiple agents to learn coordinated behaviors
by regularizing the accumulated return with the simultaneous mutual information
between multi-agent actions. By introducing a latent variable to induce nonzero
mutual information between multi-agent actions and applying a variational
bound, we derive a tractable lower bound on the considered MMI-regularized
objective function. The derived tractable objective can be interpreted as
maximum entropy reinforcement learning combined with uncertainty reduction of
other agents actions. Applying policy iteration to maximize the derived lower
bound, we propose a practical algorithm named variational maximum mutual
information multi-agent actor-critic, which follows centralized learning with
decentralized execution. We evaluated VM3-AC for several games requiring
coordination, and numerical results show that VM3-AC outperforms other MARL
algorithms in multi-agent tasks requiring high-quality coordination.
Related papers
- DCIR: Dynamic Consistency Intrinsic Reward for Multi-Agent Reinforcement
Learning [84.22561239481901]
We propose a new approach that enables agents to learn whether their behaviors should be consistent with that of other agents.
We evaluate DCIR in multiple environments including Multi-agent Particle, Google Research Football and StarCraft II Micromanagement.
arXiv Detail & Related papers (2023-12-10T06:03:57Z) - Effective Multi-Agent Deep Reinforcement Learning Control with Relative
Entropy Regularization [6.441951360534903]
Multi-Agent Continuous Dynamic Policy Gradient (MACDPP) was proposed to tackle the issues of limited capability and sample efficiency in various scenarios controlled by multiple agents.
It alleviates the inconsistency of multiple agents' policy updates by introducing the relative entropy regularization to the Training with Decentralized Execution (CTDE) framework with the Actor-Critic (AC) structure.
arXiv Detail & Related papers (2023-09-26T07:38:19Z) - RACA: Relation-Aware Credit Assignment for Ad-Hoc Cooperation in
Multi-Agent Deep Reinforcement Learning [55.55009081609396]
We propose a novel method, called Relation-Aware Credit Assignment (RACA), which achieves zero-shot generalization in ad-hoc cooperation scenarios.
RACA takes advantage of a graph-based encoder relation to encode the topological structure between agents.
Our method outperforms baseline methods on the StarCraftII micromanagement benchmark and ad-hoc cooperation scenarios.
arXiv Detail & Related papers (2022-06-02T03:39:27Z) - Policy Diagnosis via Measuring Role Diversity in Cooperative Multi-agent
RL [107.58821842920393]
We quantify the agent's behavior difference and build its relationship with the policy performance via bf Role Diversity
We find that the error bound in MARL can be decomposed into three parts that have a strong relation to the role diversity.
The decomposed factors can significantly impact policy optimization on three popular directions.
arXiv Detail & Related papers (2022-06-01T04:58:52Z) - Depthwise Convolution for Multi-Agent Communication with Enhanced
Mean-Field Approximation [9.854975702211165]
We propose a new method based on local communication learning to tackle the multi-agent RL (MARL) challenge.
First, we design a new communication protocol that exploits the ability of depthwise convolution to efficiently extract local relations.
Second, we introduce the mean-field approximation into our method to reduce the scale of agent interactions.
arXiv Detail & Related papers (2022-03-06T07:42:43Z) - DSDF: An approach to handle stochastic agents in collaborative
multi-agent reinforcement learning [0.0]
We show how thisity of agents, which could be a result of malfunction or aging of robots, can add to the uncertainty in coordination.
Our solution, DSDF which tunes the discounted factor for the agents according to uncertainty and use the values to update the utility networks of individual agents.
arXiv Detail & Related papers (2021-09-14T12:02:28Z) - Softmax with Regularization: Better Value Estimation in Multi-Agent
Reinforcement Learning [72.28520951105207]
Overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning.
We propose a novel regularization-based update scheme that penalizes large joint action-values deviating from a baseline.
We show that our method provides a consistent performance improvement on a set of challenging StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2021-03-22T14:18:39Z) - Modeling the Interaction between Agents in Cooperative Multi-Agent
Reinforcement Learning [2.9360071145551068]
We propose a novel cooperative MARL algorithm named as interactive actor-critic(IAC)
IAC models the interaction of agents from perspectives of policy and value function.
We extend the value decomposition methods to continuous control tasks and evaluate IAC on benchmark tasks including classic control and multi-agent particle environments.
arXiv Detail & Related papers (2021-02-10T01:58:28Z) - UneVEn: Universal Value Exploration for Multi-Agent Reinforcement
Learning [53.73686229912562]
We propose a novel MARL approach called Universal Value Exploration (UneVEn)
UneVEn learns a set of related tasks simultaneously with a linear decomposition of universal successor features.
Empirical results on a set of exploration games, challenging cooperative predator-prey tasks requiring significant coordination among agents, and StarCraft II micromanagement benchmarks show that UneVEn can solve tasks where other state-of-the-art MARL methods fail.
arXiv Detail & Related papers (2020-10-06T19:08:47Z) - Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in
Cooperative Tasks [11.480994804659908]
Multi-agent deep reinforcement learning (MARL) suffers from a lack of commonly-used evaluation tasks and criteria.
We provide a systematic evaluation and comparison of three different classes of MARL algorithms.
Our experiments serve as a reference for the expected performance of algorithms across different learning tasks.
arXiv Detail & Related papers (2020-06-14T11:22:53Z) - FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC)
It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.