Weighted QMIX: Expanding Monotonic Value Function Factorisation for Deep
Multi-Agent Reinforcement Learning
- URL: http://arxiv.org/abs/2006.10800v2
- Date: Thu, 22 Oct 2020 14:58:45 GMT
- Title: Weighted QMIX: Expanding Monotonic Value Function Factorisation for Deep
Multi-Agent Reinforcement Learning
- Authors: Tabish Rashid, Gregory Farquhar, Bei Peng, Shimon Whiteson
- Abstract summary: We present a new version of a popular $Q$-learning algorithm for MARL.
We show that it can recover the optimal policy even with access to $Q*$.
We also demonstrate improved performance on predator-prey and challenging multi-agent StarCraft benchmark tasks.
- Score: 66.94149388181343
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: QMIX is a popular $Q$-learning algorithm for cooperative MARL in the
centralised training and decentralised execution paradigm. In order to enable
easy decentralisation, QMIX restricts the joint action $Q$-values it can
represent to be a monotonic mixing of each agent's utilities. However, this
restriction prevents it from representing value functions in which an agent's
ordering over its actions can depend on other agents' actions. To analyse this
representational limitation, we first formalise the objective QMIX optimises,
which allows us to view QMIX as an operator that first computes the
$Q$-learning targets and then projects them into the space representable by
QMIX. This projection returns a representable $Q$-value that minimises the
unweighted squared error across all joint actions. We show in particular that
this projection can fail to recover the optimal policy even with access to
$Q^*$, which primarily stems from the equal weighting placed on each joint
action. We rectify this by introducing a weighting into the projection, in
order to place more importance on the better joint actions. We propose two
weighting schemes and prove that they recover the correct maximal action for
any joint action $Q$-values, and therefore for $Q^*$ as well. Based on our
analysis and results in the tabular setting, we introduce two scalable versions
of our algorithm, Centrally-Weighted (CW) QMIX and Optimistically-Weighted (OW)
QMIX and demonstrate improved performance on both predator-prey and challenging
multi-agent StarCraft benchmark tasks.
Related papers
- Soft-QMIX: Integrating Maximum Entropy For Monotonic Value Function Factorization [5.54284350152423]
We propose an enhancement to QMIX by incorporating an additional local Qvalue learning method within the maximum entropy RL framework.
Our approach constrains the local Q-value estimates to maintain the correct ordering of all actions.
We theoretically prove the monotonic improvement and convergence of our method to an optimal solution.
arXiv Detail & Related papers (2024-06-20T01:55:08Z) - POWQMIX: Weighted Value Factorization with Potentially Optimal Joint Actions Recognition for Cooperative Multi-Agent Reinforcement Learning [17.644279061872442]
Value function factorization methods are commonly used in cooperative multi-agent reinforcement learning.
We propose a potentially optimal joint actions weighted QMIX algorithm.
Experiments in matrix games, predator-prey, and StarCraft II Multi-Agent Challenge environments demonstrate that our algorithm outperforms the state-of-the-art value-based multi-agent reinforcement learning methods.
arXiv Detail & Related papers (2024-05-13T03:27:35Z) - Maximum Correntropy Value Decomposition for Multi-agent Deep
Reinforcemen Learning [4.743243072814404]
We introduce the Maximum Correntropy Criterion (MCC) as a cost function to dynamically adapt the weight to eliminate the effects of minimum in reward distributions.
A preliminary experiment conducted on OMG shows that MCVD could deal with non-monotonic value decomposition problems with a large tolerance of kernel bandwidth selection.
arXiv Detail & Related papers (2022-08-07T08:06:21Z) - Parameterization of Cross-Token Relations with Relative Positional
Encoding for Vision MLP [52.25478388220691]
Vision multi-layer perceptrons (MLPs) have shown promising performance in computer vision tasks.
They use token-mixing layers to capture cross-token interactions, as opposed to the multi-head self-attention mechanism used by Transformers.
We propose a new positional spacial gating unit (PoSGU) to efficiently encode the cross-token relations for token mixing.
arXiv Detail & Related papers (2022-07-15T04:18:06Z) - DQMIX: A Distributional Perspective on Multi-Agent Reinforcement
Learning [122.47938710284784]
In cooperative multi-agent tasks, a team of agents jointly interact with an environment by taking actions, receiving a reward and observing the next state.
Most of the existing value-based multi-agent reinforcement learning methods only model the expectations of individual Q-values and global Q-value.
arXiv Detail & Related papers (2022-02-21T11:28:00Z) - Revisiting QMIX: Discriminative Credit Assignment by Gradient Entropy
Regularization [126.87359177547455]
In cooperative multi-agent systems, agents jointly take actions and receive a team reward instead of individual rewards.
In the absence of individual reward signals, credit assignment mechanisms are usually introduced to discriminate the contributions of different agents.
We propose a new perspective on credit assignment measurement and empirically show that QMIX suffers limited discriminability on the assignment of credits to agents.
arXiv Detail & Related papers (2022-02-09T12:37:55Z) - MMD-MIX: Value Function Factorisation with Maximum Mean Discrepancy for
Cooperative Multi-Agent Reinforcement Learning [15.972363414919279]
MMD-mix is a method that combines distributional reinforcement learning and value decomposition.
The experiments demonstrate that MMD-mix outperforms prior baselines in the Star Multi-Agent Challenge (SMAC) environment.
arXiv Detail & Related papers (2021-06-22T10:21:00Z) - Softmax with Regularization: Better Value Estimation in Multi-Agent
Reinforcement Learning [72.28520951105207]
Overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning.
We propose a novel regularization-based update scheme that penalizes large joint action-values deviating from a baseline.
We show that our method provides a consistent performance improvement on a set of challenging StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2021-03-22T14:18:39Z) - Monotonic Value Function Factorisation for Deep Multi-Agent
Reinforcement Learning [55.20040781688844]
QMIX is a novel value-based method that can train decentralised policies in a centralised end-to-end fashion.
We propose the StarCraft Multi-Agent Challenge (SMAC) as a new benchmark for deep multi-agent reinforcement learning.
arXiv Detail & Related papers (2020-03-19T16:51:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.