QR-MIX: Distributional Value Function Factorisation for Cooperative
Multi-Agent Reinforcement Learning
- URL: http://arxiv.org/abs/2009.04197v5
- Date: Tue, 23 Feb 2021 12:37:48 GMT
- Title: QR-MIX: Distributional Value Function Factorisation for Cooperative
Multi-Agent Reinforcement Learning
- Authors: Jian Hu, Seth Austin Harding, Haibin Wu, Siyue Hu, Shih-wei Liao
- Abstract summary: In cooperative multi-Agent Reinforcement Learning (MARL), agents observe and interact with their environment locally and independently.
With local observation and random sampling, the randomness in rewards and observations leads to randomness in long-term returns.
Existing methods such as Value Decomposition Network (VDN) and QMIX estimate the value of long-term returns as a scalar that does not contain the information of randomness.
- Score: 5.564793925574797
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In Cooperative Multi-Agent Reinforcement Learning (MARL) and under the
setting of Centralized Training with Decentralized Execution (CTDE), agents
observe and interact with their environment locally and independently. With
local observation and random sampling, the randomness in rewards and
observations leads to randomness in long-term returns. Existing methods such as
Value Decomposition Network (VDN) and QMIX estimate the value of long-term
returns as a scalar that does not contain the information of randomness. Our
proposed model QR-MIX introduces quantile regression, modeling joint
state-action values as a distribution, combining QMIX with Implicit Quantile
Network (IQN). However, the monotonicity in QMIX limits the expression of joint
state-action value distribution and may lead to incorrect estimation results in
non-monotonic cases. Therefore, we proposed a flexible loss function to
approximate the monotonicity found in QMIX. Our model is not only more tolerant
of the randomness of returns, but also more tolerant of the randomness of
monotonic constraints. The experimental results demonstrate that QR-MIX
outperforms the previous state-of-the-art method QMIX in the StarCraft
Multi-Agent Challenge (SMAC) environment.
Related papers
- Soft-QMIX: Integrating Maximum Entropy For Monotonic Value Function Factorization [5.54284350152423]
We propose an enhancement to QMIX by incorporating an additional local Qvalue learning method within the maximum entropy RL framework.
Our approach constrains the local Q-value estimates to maintain the correct ordering of all actions.
We theoretically prove the monotonic improvement and convergence of our method to an optimal solution.
arXiv Detail & Related papers (2024-06-20T01:55:08Z) - Gaussian Mixture Solvers for Diffusion Models [84.83349474361204]
We introduce a novel class of SDE-based solvers called GMS for diffusion models.
Our solver outperforms numerous SDE-based solvers in terms of sample quality in image generation and stroke-based synthesis.
arXiv Detail & Related papers (2023-11-02T02:05:38Z) - Importance sampling for stochastic quantum simulations [68.8204255655161]
We introduce the qDrift protocol, which builds random product formulas by sampling from the Hamiltonian according to the coefficients.
We show that the simulation cost can be reduced while achieving the same accuracy, by considering the individual simulation cost during the sampling stage.
Results are confirmed by numerical simulations performed on a lattice nuclear effective field theory.
arXiv Detail & Related papers (2022-12-12T15:06:32Z) - Maximum Correntropy Value Decomposition for Multi-agent Deep
Reinforcemen Learning [4.743243072814404]
We introduce the Maximum Correntropy Criterion (MCC) as a cost function to dynamically adapt the weight to eliminate the effects of minimum in reward distributions.
A preliminary experiment conducted on OMG shows that MCVD could deal with non-monotonic value decomposition problems with a large tolerance of kernel bandwidth selection.
arXiv Detail & Related papers (2022-08-07T08:06:21Z) - DQMIX: A Distributional Perspective on Multi-Agent Reinforcement
Learning [122.47938710284784]
In cooperative multi-agent tasks, a team of agents jointly interact with an environment by taking actions, receiving a reward and observing the next state.
Most of the existing value-based multi-agent reinforcement learning methods only model the expectations of individual Q-values and global Q-value.
arXiv Detail & Related papers (2022-02-21T11:28:00Z) - Deep Non-Crossing Quantiles through the Partial Derivative [0.6299766708197883]
Quantile Regression provides a way to approximate a single conditional quantile.
Minimisation of the QR-loss function does not guarantee non-crossing quantiles.
We propose a generic deep learning algorithm for predicting an arbitrary number of quantiles.
arXiv Detail & Related papers (2022-01-30T15:35:21Z) - MMD-MIX: Value Function Factorisation with Maximum Mean Discrepancy for
Cooperative Multi-Agent Reinforcement Learning [15.972363414919279]
MMD-mix is a method that combines distributional reinforcement learning and value decomposition.
The experiments demonstrate that MMD-mix outperforms prior baselines in the Star Multi-Agent Challenge (SMAC) environment.
arXiv Detail & Related papers (2021-06-22T10:21:00Z) - Robust Finite Mixture Regression for Heterogeneous Targets [70.19798470463378]
We propose an FMR model that finds sample clusters and jointly models multiple incomplete mixed-type targets simultaneously.
We provide non-asymptotic oracle performance bounds for our model under a high-dimensional learning framework.
The results show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-10-12T03:27:07Z) - Weighted QMIX: Expanding Monotonic Value Function Factorisation for Deep
Multi-Agent Reinforcement Learning [66.94149388181343]
We present a new version of a popular $Q$-learning algorithm for MARL.
We show that it can recover the optimal policy even with access to $Q*$.
We also demonstrate improved performance on predator-prey and challenging multi-agent StarCraft benchmark tasks.
arXiv Detail & Related papers (2020-06-18T18:34:50Z) - Monotonic Value Function Factorisation for Deep Multi-Agent
Reinforcement Learning [55.20040781688844]
QMIX is a novel value-based method that can train decentralised policies in a centralised end-to-end fashion.
We propose the StarCraft Multi-Agent Challenge (SMAC) as a new benchmark for deep multi-agent reinforcement learning.
arXiv Detail & Related papers (2020-03-19T16:51:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.