QVMix and QVMix-Max: Extending the Deep Quality-Value Family of
Algorithms to Cooperative Multi-Agent Reinforcement Learning
- URL: http://arxiv.org/abs/2012.12062v1
- Date: Tue, 22 Dec 2020 14:53:42 GMT
- Title: QVMix and QVMix-Max: Extending the Deep Quality-Value Family of
Algorithms to Cooperative Multi-Agent Reinforcement Learning
- Authors: Pascal Leroy, Damien Ernst, Pierre Geurts, Gilles Louppe, Jonathan
Pisane, Matthia Sabatelli
- Abstract summary: This paper introduces four new algorithms for tackling multi-agent reinforcement learning problems.
All algorithms are based on the Deep Quality-Value family of algorithms.
We show competitive results when QVMix and QVMix-Max are compared to well-known MARL techniques.
- Score: 10.334745043233974
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper introduces four new algorithms that can be used for tackling
multi-agent reinforcement learning (MARL) problems occurring in cooperative
settings. All algorithms are based on the Deep Quality-Value (DQV) family of
algorithms, a set of techniques that have proven to be successful when dealing
with single-agent reinforcement learning problems (SARL). The key idea of DQV
algorithms is to jointly learn an approximation of the state-value function
$V$, alongside an approximation of the state-action value function $Q$. We
follow this principle and generalise these algorithms by introducing two fully
decentralised MARL algorithms (IQV and IQV-Max) and two algorithms that are
based on the centralised training with decentralised execution training
paradigm (QVMix and QVMix-Max). We compare our algorithms with state-of-the-art
MARL techniques on the popular StarCraft Multi-Agent Challenge (SMAC)
environment. We show competitive results when QVMix and QVMix-Max are compared
to well-known MARL techniques such as QMIX and MAVEN and show that QVMix can
even outperform them on some of the tested environments, being the algorithm
which performs best overall. We hypothesise that this is due to the fact that
QVMix suffers less from the overestimation bias of the $Q$ function.
Related papers
- Provably Efficient Information-Directed Sampling Algorithms for Multi-Agent Reinforcement Learning [50.92957910121088]
This work designs and analyzes a novel set of algorithms for multi-agent reinforcement learning (MARL) based on the principle of information-directed sampling (IDS)
For episodic two-player zero-sum MGs, we present three sample-efficient algorithms for learning Nash equilibrium.
We extend Reg-MAIDS to multi-player general-sum MGs and prove that it can learn either the Nash equilibrium or coarse correlated equilibrium in a sample efficient manner.
arXiv Detail & Related papers (2024-04-30T06:48:56Z) - Inverse Factorized Q-Learning for Cooperative Multi-agent Imitation
Learning [13.060023718506917]
imitation learning (IL) is a problem of learning to mimic expert behaviors from demonstrations in cooperative multi-agent systems.
We introduce a novel multi-agent IL algorithm designed to address these challenges.
Our approach enables the centralized learning by leveraging mixing networks to aggregate decentralized Q functions.
arXiv Detail & Related papers (2023-10-10T17:11:20Z) - Improving Sample Efficiency of Model-Free Algorithms for Zero-Sum Markov Games [66.2085181793014]
We show that a model-free stage-based Q-learning algorithm can enjoy the same optimality in the $H$ dependence as model-based algorithms.
Our algorithm features a key novel design of updating the reference value functions as the pair of optimistic and pessimistic value functions.
arXiv Detail & Related papers (2023-08-17T08:34:58Z) - Breaking the Curse of Multiagency: Provably Efficient Decentralized
Multi-Agent RL with Function Approximation [44.051717720483595]
This paper presents the first line of MARL algorithms that provably resolve the curse of multiagency approximation.
In exchange for learning a weaker version of CCEs, this algorithm applies to a wider range of problems under generic function approximation.
Our algorithm always outputs Markov CCEs, and an optimal rate of $widetildemathcalO(epsilon-2)$ for finding $epsilon$-optimal solutions.
arXiv Detail & Related papers (2023-02-13T18:59:25Z) - MA2QL: A Minimalist Approach to Fully Decentralized Multi-Agent
Reinforcement Learning [63.46052494151171]
We propose textitmulti-agent alternate Q-learning (MA2QL), where agents take turns to update their Q-functions by Q-learning.
We prove that when each agent guarantees a $varepsilon$-convergence at each turn, their joint policy converges to a Nash equilibrium.
Results show MA2QL consistently outperforms IQL, which verifies the effectiveness of MA2QL, despite such minimal changes.
arXiv Detail & Related papers (2022-09-17T04:54:32Z) - V-Learning -- A Simple, Efficient, Decentralized Algorithm for
Multiagent RL [35.304241088947116]
V-learning is a new class of single-agent RL algorithms that convert any adversarial bandit algorithm with suitable regret guarantees into a RL algorithm.
Unlike Q-learning, it only maintains the estimates of V-values instead of Q-values.
arXiv Detail & Related papers (2021-10-27T16:25:55Z) - MMD-MIX: Value Function Factorisation with Maximum Mean Discrepancy for
Cooperative Multi-Agent Reinforcement Learning [15.972363414919279]
MMD-mix is a method that combines distributional reinforcement learning and value decomposition.
The experiments demonstrate that MMD-mix outperforms prior baselines in the Star Multi-Agent Challenge (SMAC) environment.
arXiv Detail & Related papers (2021-06-22T10:21:00Z) - Softmax with Regularization: Better Value Estimation in Multi-Agent
Reinforcement Learning [72.28520951105207]
Overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning.
We propose a novel regularization-based update scheme that penalizes large joint action-values deviating from a baseline.
We show that our method provides a consistent performance improvement on a set of challenging StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2021-03-22T14:18:39Z) - SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms.
SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z) - Weighted QMIX: Expanding Monotonic Value Function Factorisation for Deep
Multi-Agent Reinforcement Learning [66.94149388181343]
We present a new version of a popular $Q$-learning algorithm for MARL.
We show that it can recover the optimal policy even with access to $Q*$.
We also demonstrate improved performance on predator-prey and challenging multi-agent StarCraft benchmark tasks.
arXiv Detail & Related papers (2020-06-18T18:34:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.