Revisiting QMIX: Discriminative Credit Assignment by Gradient Entropy
Regularization
- URL: http://arxiv.org/abs/2202.04427v1
- Date: Wed, 9 Feb 2022 12:37:55 GMT
- Title: Revisiting QMIX: Discriminative Credit Assignment by Gradient Entropy
Regularization
- Authors: Jian Zhao, Yue Zhang, Xunhan Hu, Weixun Wang, Wengang Zhou, Jianye
Hao, Jiangcheng Zhu, Houqiang Li
- Abstract summary: In cooperative multi-agent systems, agents jointly take actions and receive a team reward instead of individual rewards.
In the absence of individual reward signals, credit assignment mechanisms are usually introduced to discriminate the contributions of different agents.
We propose a new perspective on credit assignment measurement and empirically show that QMIX suffers limited discriminability on the assignment of credits to agents.
- Score: 126.87359177547455
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In cooperative multi-agent systems, agents jointly take actions and receive a
team reward instead of individual rewards. In the absence of individual reward
signals, credit assignment mechanisms are usually introduced to discriminate
the contributions of different agents so as to achieve effective cooperation.
Recently, the value decomposition paradigm has been widely adopted to realize
credit assignment, and QMIX has become the state-of-the-art solution. In this
paper, we revisit QMIX from two aspects. First, we propose a new perspective on
credit assignment measurement and empirically show that QMIX suffers limited
discriminability on the assignment of credits to agents. Second, we propose a
gradient entropy regularization with QMIX to realize a discriminative credit
assignment, thereby improving the overall performance. The experiments
demonstrate that our approach can comparatively improve learning efficiency and
achieve better performance.
Related papers
- Would I have gotten that reward? Long-term credit assignment by
counterfactual contribution analysis [50.926791529605396]
We introduce Counterfactual Contribution Analysis (COCOA), a new family of model-based credit assignment algorithms.
Our algorithms achieve precise credit assignment by measuring the contribution of actions upon obtaining subsequent rewards.
arXiv Detail & Related papers (2023-06-29T09:27:27Z) - AIIR-MIX: Multi-Agent Reinforcement Learning Meets Attention Individual
Intrinsic Reward Mixing Network [2.057898896648108]
Deducing the contribution of each agent and assigning the corresponding reward to them is a crucial problem in cooperative Multi-Agent Reinforcement Learning (MARL)
Previous studies try to resolve the issue through designing an intrinsic reward function, but the intrinsic reward is simply combined with the environment reward by summation.
We propose Attention Individual Intrinsic Reward Mixing Network (AIIR-mix) in MARL.
arXiv Detail & Related papers (2023-02-19T10:25:25Z) - Adaptive Value Decomposition with Greedy Marginal Contribution
Computation for Cooperative Multi-Agent Reinforcement Learning [48.41925886860991]
Real-world cooperation often requires intensive coordination among agents simultaneously.
Traditional methods that learn the value function as a monotonic mixing of per-agent utilities cannot solve the tasks with non-monotonic returns.
We propose a novel explicit credit assignment method to address the non-monotonic problem.
arXiv Detail & Related papers (2023-02-14T07:23:59Z) - Credit-cognisant reinforcement learning for multi-agent cooperation [0.0]
We introduce the concept of credit-cognisant rewards, which allows an agent to perceive the effect its actions had on the environment as well as on its co-agents.
We show that by manipulating these experiences and constructing the reward contained within them to include the rewards received by all the agents within the same action sequence, we are able to improve significantly on the performance of independent deep Q-learning.
arXiv Detail & Related papers (2022-11-18T09:00:25Z) - DQMIX: A Distributional Perspective on Multi-Agent Reinforcement
Learning [122.47938710284784]
In cooperative multi-agent tasks, a team of agents jointly interact with an environment by taking actions, receiving a reward and observing the next state.
Most of the existing value-based multi-agent reinforcement learning methods only model the expectations of individual Q-values and global Q-value.
arXiv Detail & Related papers (2022-02-21T11:28:00Z) - Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning [34.856522993714535]
We propose Shapley Counterfactual Credit Assignment, a novel method for explicit credit assignment which accounts for the coalition of agents.
Our method outperforms existing cooperative MARL algorithms significantly and achieves the state-of-the-art, with especially large margins on tasks with more severe difficulties.
arXiv Detail & Related papers (2021-06-01T07:38:34Z) - Softmax with Regularization: Better Value Estimation in Multi-Agent
Reinforcement Learning [72.28520951105207]
Overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning.
We propose a novel regularization-based update scheme that penalizes large joint action-values deviating from a baseline.
We show that our method provides a consistent performance improvement on a set of challenging StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2021-03-22T14:18:39Z) - Weighted QMIX: Expanding Monotonic Value Function Factorisation for Deep
Multi-Agent Reinforcement Learning [66.94149388181343]
We present a new version of a popular $Q$-learning algorithm for MARL.
We show that it can recover the optimal policy even with access to $Q*$.
We also demonstrate improved performance on predator-prey and challenging multi-agent StarCraft benchmark tasks.
arXiv Detail & Related papers (2020-06-18T18:34:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.