Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning
- URL: http://arxiv.org/abs/2106.00285v1
- Date: Tue, 1 Jun 2021 07:38:34 GMT
- Title: Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning
- Authors: Jiahui Li, Kun Kuang, Baoxiang Wang, Furui Liu, Long Chen, Fei Wu and
Jun Xiao
- Abstract summary: We propose Shapley Counterfactual Credit Assignment, a novel method for explicit credit assignment which accounts for the coalition of agents.
Our method outperforms existing cooperative MARL algorithms significantly and achieves the state-of-the-art, with especially large margins on tasks with more severe difficulties.
- Score: 34.856522993714535
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Centralized Training with Decentralized Execution (CTDE) has been a popular
paradigm in cooperative Multi-Agent Reinforcement Learning (MARL) settings and
is widely used in many real applications. One of the major challenges in the
training process is credit assignment, which aims to deduce the contributions
of each agent according to the global rewards. Existing credit assignment
methods focus on either decomposing the joint value function into individual
value functions or measuring the impact of local observations and actions on
the global value function. These approaches lack a thorough consideration of
the complicated interactions among multiple agents, leading to an unsuitable
assignment of credit and subsequently mediocre results on MARL. We propose
Shapley Counterfactual Credit Assignment, a novel method for explicit credit
assignment which accounts for the coalition of agents. Specifically, Shapley
Value and its desired properties are leveraged in deep MARL to credit any
combinations of agents, which grants us the capability to estimate the
individual credit for each agent. Despite this capability, the main technical
difficulty lies in the computational complexity of Shapley Value who grows
factorially as the number of agents. We instead utilize an approximation method
via Monte Carlo sampling, which reduces the sample complexity while maintaining
its effectiveness. We evaluate our method on StarCraft II benchmarks across
different scenarios. Our method outperforms existing cooperative MARL
algorithms significantly and achieves the state-of-the-art, with especially
large margins on tasks with more severe difficulties.
Related papers
- Efficiently Quantifying Individual Agent Importance in Cooperative MARL [4.653136482223517]
We adapt difference rewards into an efficient method for quantifying the contribution of individual agents, referred to as Agent Importance.
We show empirically that the computed values are strongly correlated with the true Shapley values, as well as the true underlying individual agent rewards.
arXiv Detail & Related papers (2023-12-13T19:09:37Z) - Learning Multi-Agent Intention-Aware Communication for Optimal
Multi-Order Execution in Finance [96.73189436721465]
We first present a multi-agent RL (MARL) method for multi-order execution considering practical constraints.
We propose a learnable multi-round communication protocol, for the agents communicating the intended actions with each other.
Experiments on the data from two real-world markets have illustrated superior performance with significantly better collaboration effectiveness.
arXiv Detail & Related papers (2023-07-06T16:45:40Z) - STAS: Spatial-Temporal Return Decomposition for Multi-agent
Reinforcement Learning [10.102447181869005]
We introduce a novel method that learns credit assignment in both temporal and spatial dimensions.
Our results demonstrate that our method effectively assigns spatial-temporal credit, outperforming all state-of-the-art baselines.
arXiv Detail & Related papers (2023-04-15T10:09:03Z) - RACA: Relation-Aware Credit Assignment for Ad-Hoc Cooperation in
Multi-Agent Deep Reinforcement Learning [55.55009081609396]
We propose a novel method, called Relation-Aware Credit Assignment (RACA), which achieves zero-shot generalization in ad-hoc cooperation scenarios.
RACA takes advantage of a graph-based encoder relation to encode the topological structure between agents.
Our method outperforms baseline methods on the StarCraftII micromanagement benchmark and ad-hoc cooperation scenarios.
arXiv Detail & Related papers (2022-06-02T03:39:27Z) - Value Functions Factorization with Latent State Information Sharing in
Decentralized Multi-Agent Policy Gradients [43.862956745961654]
LSF-SAC is a novel framework that features a variational inference-based information-sharing mechanism as extra state information.
We evaluate LSF-SAC on the StarCraft II micromanagement challenge and demonstrate that it outperforms several state-of-the-art methods in challenging collaborative tasks.
arXiv Detail & Related papers (2022-01-04T17:05:07Z) - Learning Cooperative Multi-Agent Policies with Partial Reward Decoupling [13.915157044948364]
One of the preeminent obstacles to scaling multi-agent reinforcement learning is assigning credit to individual agents' actions.
In this paper, we address this credit assignment problem with an approach that we call textitpartial reward decoupling (PRD)
PRD decomposes large cooperative multi-agent RL problems into decoupled subproblems involving subsets of agents, thereby simplifying credit assignment.
arXiv Detail & Related papers (2021-12-23T17:48:04Z) - Locality Matters: A Scalable Value Decomposition Approach for
Cooperative Multi-Agent Reinforcement Learning [52.7873574425376]
Cooperative multi-agent reinforcement learning (MARL) faces significant scalability issues due to state and action spaces that are exponentially large in the number of agents.
We propose a novel, value-based multi-agent algorithm called LOMAQ, which incorporates local rewards in the Training Decentralized Execution paradigm.
arXiv Detail & Related papers (2021-09-22T10:08:15Z) - UneVEn: Universal Value Exploration for Multi-Agent Reinforcement
Learning [53.73686229912562]
We propose a novel MARL approach called Universal Value Exploration (UneVEn)
UneVEn learns a set of related tasks simultaneously with a linear decomposition of universal successor features.
Empirical results on a set of exploration games, challenging cooperative predator-prey tasks requiring significant coordination among agents, and StarCraft II micromanagement benchmarks show that UneVEn can solve tasks where other state-of-the-art MARL methods fail.
arXiv Detail & Related papers (2020-10-06T19:08:47Z) - F2A2: Flexible Fully-decentralized Approximate Actor-critic for
Cooperative Multi-agent Reinforcement Learning [110.35516334788687]
Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications.
We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting.
Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
arXiv Detail & Related papers (2020-04-17T14:56:29Z) - FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC)
It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.