Adaptive Value Decomposition with Greedy Marginal Contribution
Computation for Cooperative Multi-Agent Reinforcement Learning
- URL: http://arxiv.org/abs/2302.06872v1
- Date: Tue, 14 Feb 2023 07:23:59 GMT
- Title: Adaptive Value Decomposition with Greedy Marginal Contribution
Computation for Cooperative Multi-Agent Reinforcement Learning
- Authors: Shanqi Liu, Yujing Hu, Runze Wu, Dong Xing, Yu Xiong, Changjie Fan,
Kun Kuang, Yong Liu
- Abstract summary: Real-world cooperation often requires intensive coordination among agents simultaneously.
Traditional methods that learn the value function as a monotonic mixing of per-agent utilities cannot solve the tasks with non-monotonic returns.
We propose a novel explicit credit assignment method to address the non-monotonic problem.
- Score: 48.41925886860991
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Real-world cooperation often requires intensive coordination among agents
simultaneously. This task has been extensively studied within the framework of
cooperative multi-agent reinforcement learning (MARL), and value decomposition
methods are among those cutting-edge solutions. However, traditional methods
that learn the value function as a monotonic mixing of per-agent utilities
cannot solve the tasks with non-monotonic returns. This hinders their
application in generic scenarios. Recent methods tackle this problem from the
perspective of implicit credit assignment by learning value functions with
complete expressiveness or using additional structures to improve cooperation.
However, they are either difficult to learn due to large joint action spaces or
insufficient to capture the complicated interactions among agents which are
essential to solving tasks with non-monotonic returns. To address these
problems, we propose a novel explicit credit assignment method to address the
non-monotonic problem. Our method, Adaptive Value decomposition with Greedy
Marginal contribution (AVGM), is based on an adaptive value decomposition that
learns the cooperative value of a group of dynamically changing agents. We
first illustrate that the proposed value decomposition can consider the
complicated interactions among agents and is feasible to learn in large-scale
scenarios. Then, our method uses a greedy marginal contribution computed from
the value decomposition as an individual credit to incentivize agents to learn
the optimal cooperative policy. We further extend the module with an action
encoder to guarantee the linear time complexity for computing the greedy
marginal contribution. Experimental results demonstrate that our method
achieves significant performance improvements in several non-monotonic domains.
Related papers
- Learning Reward Machines in Cooperative Multi-Agent Tasks [75.79805204646428]
This paper presents a novel approach to Multi-Agent Reinforcement Learning (MARL)
It combines cooperative task decomposition with the learning of reward machines (RMs) encoding the structure of the sub-tasks.
The proposed method helps deal with the non-Markovian nature of the rewards in partially observable environments.
arXiv Detail & Related papers (2023-03-24T15:12:28Z) - RACA: Relation-Aware Credit Assignment for Ad-Hoc Cooperation in
Multi-Agent Deep Reinforcement Learning [55.55009081609396]
We propose a novel method, called Relation-Aware Credit Assignment (RACA), which achieves zero-shot generalization in ad-hoc cooperation scenarios.
RACA takes advantage of a graph-based encoder relation to encode the topological structure between agents.
Our method outperforms baseline methods on the StarCraftII micromanagement benchmark and ad-hoc cooperation scenarios.
arXiv Detail & Related papers (2022-06-02T03:39:27Z) - Locality Matters: A Scalable Value Decomposition Approach for
Cooperative Multi-Agent Reinforcement Learning [52.7873574425376]
Cooperative multi-agent reinforcement learning (MARL) faces significant scalability issues due to state and action spaces that are exponentially large in the number of agents.
We propose a novel, value-based multi-agent algorithm called LOMAQ, which incorporates local rewards in the Training Decentralized Execution paradigm.
arXiv Detail & Related papers (2021-09-22T10:08:15Z) - Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning [34.856522993714535]
We propose Shapley Counterfactual Credit Assignment, a novel method for explicit credit assignment which accounts for the coalition of agents.
Our method outperforms existing cooperative MARL algorithms significantly and achieves the state-of-the-art, with especially large margins on tasks with more severe difficulties.
arXiv Detail & Related papers (2021-06-01T07:38:34Z) - Softmax with Regularization: Better Value Estimation in Multi-Agent
Reinforcement Learning [72.28520951105207]
Overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning.
We propose a novel regularization-based update scheme that penalizes large joint action-values deviating from a baseline.
We show that our method provides a consistent performance improvement on a set of challenging StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2021-03-22T14:18:39Z) - Modeling the Interaction between Agents in Cooperative Multi-Agent
Reinforcement Learning [2.9360071145551068]
We propose a novel cooperative MARL algorithm named as interactive actor-critic(IAC)
IAC models the interaction of agents from perspectives of policy and value function.
We extend the value decomposition methods to continuous control tasks and evaluate IAC on benchmark tasks including classic control and multi-agent particle environments.
arXiv Detail & Related papers (2021-02-10T01:58:28Z) - UneVEn: Universal Value Exploration for Multi-Agent Reinforcement
Learning [53.73686229912562]
We propose a novel MARL approach called Universal Value Exploration (UneVEn)
UneVEn learns a set of related tasks simultaneously with a linear decomposition of universal successor features.
Empirical results on a set of exploration games, challenging cooperative predator-prey tasks requiring significant coordination among agents, and StarCraft II micromanagement benchmarks show that UneVEn can solve tasks where other state-of-the-art MARL methods fail.
arXiv Detail & Related papers (2020-10-06T19:08:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.