DQMIX: A Distributional Perspective on Multi-Agent Reinforcement
Learning
- URL: http://arxiv.org/abs/2202.10134v1
- Date: Mon, 21 Feb 2022 11:28:00 GMT
- Title: DQMIX: A Distributional Perspective on Multi-Agent Reinforcement
Learning
- Authors: Jian Zhao, Mingyu Yang, Xunhan Hu, Wengang Zhou, Houqiang Li
- Abstract summary: In cooperative multi-agent tasks, a team of agents jointly interact with an environment by taking actions, receiving a reward and observing the next state.
Most of the existing value-based multi-agent reinforcement learning methods only model the expectations of individual Q-values and global Q-value.
- Score: 122.47938710284784
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In cooperative multi-agent tasks, a team of agents jointly interact with an
environment by taking actions, receiving a team reward and observing the next
state. During the interactions, the uncertainty of environment and reward will
inevitably induce stochasticity in the long-term returns and the randomness can
be exacerbated with the increasing number of agents. However, most of the
existing value-based multi-agent reinforcement learning (MARL) methods only
model the expectations of individual Q-values and global Q-value, ignoring such
randomness. Compared to the expectations of the long-term returns, it is more
preferable to directly model the stochasticity by estimating the returns
through distributions. With this motivation, this work proposes DQMIX, a novel
value-based MARL method, from a distributional perspective. Specifically, we
model each individual Q-value with a categorical distribution. To integrate
these individual Q-value distributions into the global Q-value distribution, we
design a distribution mixing network, based on five basic operations on the
distribution. We further prove that DQMIX satisfies the
\emph{Distributional-Individual-Global-Max} (DIGM) principle with respect to
the expectation of distribution, which guarantees the consistency between joint
and individual greedy action selections in the global Q-value and individual
Q-values. To validate DQMIX, we demonstrate its ability to factorize a matrix
game with stochastic rewards. Furthermore, the experimental results on a
challenging set of StarCraft II micromanagement tasks show that DQMIX
consistently outperforms the value-based multi-agent reinforcement learning
baselines.
Related papers
- Quantile Regression for Distributional Reward Models in RLHF [1.8130068086063336]
We introduce Quantile Reward Models (QRMs), a novel approach to reward modeling that learns a distribution over rewards instead of a single scalar value.
Our method uses quantile regression to estimate a full, potentially multimodal distribution over preferences, providing a more powerful and nuanced representation of preferences.
Our experimental results show that QRM outperforms comparable traditional point-estimate models on RewardBench.
arXiv Detail & Related papers (2024-09-16T10:54:04Z) - Soft-QMIX: Integrating Maximum Entropy For Monotonic Value Function Factorization [5.54284350152423]
We propose an enhancement to QMIX by incorporating an additional local Qvalue learning method within the maximum entropy RL framework.
Our approach constrains the local Q-value estimates to maintain the correct ordering of all actions.
We theoretically prove the monotonic improvement and convergence of our method to an optimal solution.
arXiv Detail & Related papers (2024-06-20T01:55:08Z) - Value-Distributional Model-Based Reinforcement Learning [59.758009422067]
Quantifying uncertainty about a policy's long-term performance is important to solve sequential decision-making tasks.
We study the problem from a model-based Bayesian reinforcement learning perspective.
We propose Epistemic Quantile-Regression (EQR), a model-based algorithm that learns a value distribution function.
arXiv Detail & Related papers (2023-08-12T14:59:19Z) - Distributional Reinforcement Learning for Multi-Dimensional Reward
Functions [91.88969237680669]
We introduce Multi-Dimensional Distributional DQN (MD3QN) to model the joint return distribution from multiple reward sources.
As a by-product of joint distribution modeling, MD3QN can capture the randomness in returns for each source of reward.
In experiments, our method accurately models the joint return distribution in environments with richly correlated reward functions.
arXiv Detail & Related papers (2021-10-26T11:24:23Z) - MMD-MIX: Value Function Factorisation with Maximum Mean Discrepancy for
Cooperative Multi-Agent Reinforcement Learning [15.972363414919279]
MMD-mix is a method that combines distributional reinforcement learning and value decomposition.
The experiments demonstrate that MMD-mix outperforms prior baselines in the Star Multi-Agent Challenge (SMAC) environment.
arXiv Detail & Related papers (2021-06-22T10:21:00Z) - QR-MIX: Distributional Value Function Factorisation for Cooperative
Multi-Agent Reinforcement Learning [5.564793925574797]
In cooperative multi-Agent Reinforcement Learning (MARL), agents observe and interact with their environment locally and independently.
With local observation and random sampling, the randomness in rewards and observations leads to randomness in long-term returns.
Existing methods such as Value Decomposition Network (VDN) and QMIX estimate the value of long-term returns as a scalar that does not contain the information of randomness.
arXiv Detail & Related papers (2020-09-09T10:28:44Z) - Weighted QMIX: Expanding Monotonic Value Function Factorisation for Deep
Multi-Agent Reinforcement Learning [66.94149388181343]
We present a new version of a popular $Q$-learning algorithm for MARL.
We show that it can recover the optimal policy even with access to $Q*$.
We also demonstrate improved performance on predator-prey and challenging multi-agent StarCraft benchmark tasks.
arXiv Detail & Related papers (2020-06-18T18:34:50Z) - Monotonic Value Function Factorisation for Deep Multi-Agent
Reinforcement Learning [55.20040781688844]
QMIX is a novel value-based method that can train decentralised policies in a centralised end-to-end fashion.
We propose the StarCraft Multi-Agent Challenge (SMAC) as a new benchmark for deep multi-agent reinforcement learning.
arXiv Detail & Related papers (2020-03-19T16:51:51Z) - Distributionally Robust Bayesian Quadrature Optimization [60.383252534861136]
We study BQO under distributional uncertainty in which the underlying probability distribution is unknown except for a limited set of its i.i.d. samples.
A standard BQO approach maximizes the Monte Carlo estimate of the true expected objective given the fixed sample set.
We propose a novel posterior sampling based algorithm, namely distributionally robust BQO (DRBQO) for this purpose.
arXiv Detail & Related papers (2020-01-19T12:00:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.