MMD-MIX: Value Function Factorisation with Maximum Mean Discrepancy for
Cooperative Multi-Agent Reinforcement Learning
- URL: http://arxiv.org/abs/2106.11652v1
- Date: Tue, 22 Jun 2021 10:21:00 GMT
- Title: MMD-MIX: Value Function Factorisation with Maximum Mean Discrepancy for
Cooperative Multi-Agent Reinforcement Learning
- Authors: Zhiwei Xu, Dapeng Li, Yunpeng Bai, Guoliang Fan
- Abstract summary: MMD-mix is a method that combines distributional reinforcement learning and value decomposition.
The experiments demonstrate that MMD-mix outperforms prior baselines in the Star Multi-Agent Challenge (SMAC) environment.
- Score: 15.972363414919279
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the real world, many tasks require multiple agents to cooperate with each
other under the condition of local observations. To solve such problems, many
multi-agent reinforcement learning methods based on Centralized Training with
Decentralized Execution have been proposed. One representative class of work is
value decomposition, which decomposes the global joint Q-value $Q_\text{jt}$
into individual Q-values $Q_a$ to guide individuals' behaviors, e.g. VDN
(Value-Decomposition Networks) and QMIX. However, these baselines often ignore
the randomness in the situation. We propose MMD-MIX, a method that combines
distributional reinforcement learning and value decomposition to alleviate the
above weaknesses. Besides, to improve data sampling efficiency, we were
inspired by REM (Random Ensemble Mixture) which is a robust RL algorithm to
explicitly introduce randomness into the MMD-MIX. The experiments demonstrate
that MMD-MIX outperforms prior baselines in the StarCraft Multi-Agent Challenge
(SMAC) environment.
Related papers
- Soft-QMIX: Integrating Maximum Entropy For Monotonic Value Function Factorization [5.54284350152423]
We propose an enhancement to QMIX by incorporating an additional local Qvalue learning method within the maximum entropy RL framework.
Our approach constrains the local Q-value estimates to maintain the correct ordering of all actions.
We theoretically prove the monotonic improvement and convergence of our method to an optimal solution.
arXiv Detail & Related papers (2024-06-20T01:55:08Z) - DQMIX: A Distributional Perspective on Multi-Agent Reinforcement
Learning [122.47938710284784]
In cooperative multi-agent tasks, a team of agents jointly interact with an environment by taking actions, receiving a reward and observing the next state.
Most of the existing value-based multi-agent reinforcement learning methods only model the expectations of individual Q-values and global Q-value.
arXiv Detail & Related papers (2022-02-21T11:28:00Z) - Value Function Factorisation with Hypergraph Convolution for Cooperative
Multi-agent Reinforcement Learning [32.768661516953344]
We propose a method that combines hypergraph convolution with value decomposition.
By treating action values as signals, HGCN-Mix aims to explore the relationship between these signals via a self-learning hypergraph.
Experimental results present that HGCN-Mix matches or surpasses state-of-the-art techniques in the StarCraft II multi-agent challenge (SMAC) benchmark.
arXiv Detail & Related papers (2021-12-09T08:40:38Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - Softmax with Regularization: Better Value Estimation in Multi-Agent
Reinforcement Learning [72.28520951105207]
Overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning.
We propose a novel regularization-based update scheme that penalizes large joint action-values deviating from a baseline.
We show that our method provides a consistent performance improvement on a set of challenging StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2021-03-22T14:18:39Z) - UneVEn: Universal Value Exploration for Multi-Agent Reinforcement
Learning [53.73686229912562]
We propose a novel MARL approach called Universal Value Exploration (UneVEn)
UneVEn learns a set of related tasks simultaneously with a linear decomposition of universal successor features.
Empirical results on a set of exploration games, challenging cooperative predator-prey tasks requiring significant coordination among agents, and StarCraft II micromanagement benchmarks show that UneVEn can solve tasks where other state-of-the-art MARL methods fail.
arXiv Detail & Related papers (2020-10-06T19:08:47Z) - QR-MIX: Distributional Value Function Factorisation for Cooperative
Multi-Agent Reinforcement Learning [5.564793925574797]
In cooperative multi-Agent Reinforcement Learning (MARL), agents observe and interact with their environment locally and independently.
With local observation and random sampling, the randomness in rewards and observations leads to randomness in long-term returns.
Existing methods such as Value Decomposition Network (VDN) and QMIX estimate the value of long-term returns as a scalar that does not contain the information of randomness.
arXiv Detail & Related papers (2020-09-09T10:28:44Z) - Weighted QMIX: Expanding Monotonic Value Function Factorisation for Deep
Multi-Agent Reinforcement Learning [66.94149388181343]
We present a new version of a popular $Q$-learning algorithm for MARL.
We show that it can recover the optimal policy even with access to $Q*$.
We also demonstrate improved performance on predator-prey and challenging multi-agent StarCraft benchmark tasks.
arXiv Detail & Related papers (2020-06-18T18:34:50Z) - Monotonic Value Function Factorisation for Deep Multi-Agent
Reinforcement Learning [55.20040781688844]
QMIX is a novel value-based method that can train decentralised policies in a centralised end-to-end fashion.
We propose the StarCraft Multi-Agent Challenge (SMAC) as a new benchmark for deep multi-agent reinforcement learning.
arXiv Detail & Related papers (2020-03-19T16:51:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.