Related papers: QVMix and QVMix-Max: Extending the Deep Quality-Value Family of Algorithms to Cooperative Multi-Agent Reinforcement Learning

QVMix and QVMix-Max: Extending the Deep Quality-Value Family of Algorithms to Cooperative Multi-Agent Reinforcement Learning

URL: http://arxiv.org/abs/2012.12062v1
Date: Tue, 22 Dec 2020 14:53:42 GMT
Title: QVMix and QVMix-Max: Extending the Deep Quality-Value Family of Algorithms to Cooperative Multi-Agent Reinforcement Learning
Authors: Pascal Leroy, Damien Ernst, Pierre Geurts, Gilles Louppe, Jonathan Pisane, Matthia Sabatelli
Abstract summary: This paper introduces four new algorithms for tackling multi-agent reinforcement learning problems. All algorithms are based on the Deep Quality-Value family of algorithms. We show competitive results when QVMix and QVMix-Max are compared to well-known MARL techniques.
Score: 10.334745043233974
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper introduces four new algorithms that can be used for tackling multi-agent reinforcement learning (MARL) problems occurring in cooperative settings. All algorithms are based on the Deep Quality-Value (DQV) family of algorithms, a set of techniques that have proven to be successful when dealing with single-agent reinforcement learning problems (SARL). The key idea of DQV algorithms is to jointly learn an approximation of the state-value function $V$, alongside an approximation of the state-action value function $Q$. We follow this principle and generalise these algorithms by introducing two fully decentralised MARL algorithms (IQV and IQV-Max) and two algorithms that are based on the centralised training with decentralised execution training paradigm (QVMix and QVMix-Max). We compare our algorithms with state-of-the-art MARL techniques on the popular StarCraft Multi-Agent Challenge (SMAC) environment. We show competitive results when QVMix and QVMix-Max are compared to well-known MARL techniques such as QMIX and MAVEN and show that QVMix can even outperform them on some of the tested environments, being the algorithm which performs best overall. We hypothesise that this is due to the fact that QVMix suffers less from the overestimation bias of the $Q$ function.

Related papers

Coverage Analysis for Digital Cousin Selection -- Improving Multi-Environment Q-Learning [24.212773534280387]
Recent advancements include multi-environment mixed Q-learning (MEMQ) algorithms. MEMQ algorithms outperform several state-of-the-art Q-learning algorithms in terms of accuracy, complexity, and robustness. We present a novel CC-based MEMQ algorithm to improve the accuracy and complexity of existing MEMQ algorithms.
arXiv Detail & Related papers (2024-11-13T06:16:12Z)
Provably Efficient Information-Directed Sampling Algorithms for Multi-Agent Reinforcement Learning [50.92957910121088]
This work designs and analyzes a novel set of algorithms for multi-agent reinforcement learning (MARL) based on the principle of information-directed sampling (IDS) For episodic two-player zero-sum MGs, we present three sample-efficient algorithms for learning Nash equilibrium. We extend Reg-MAIDS to multi-player general-sum MGs and prove that it can learn either the Nash equilibrium or coarse correlated equilibrium in a sample efficient manner.
arXiv Detail & Related papers (2024-04-30T06:48:56Z)
Inverse Factorized Q-Learning for Cooperative Multi-agent Imitation Learning [13.060023718506917]
imitation learning (IL) is a problem of learning to mimic expert behaviors from demonstrations in cooperative multi-agent systems. We introduce a novel multi-agent IL algorithm designed to address these challenges. Our approach enables the centralized learning by leveraging mixing networks to aggregate decentralized Q functions.
arXiv Detail & Related papers (2023-10-10T17:11:20Z)
Improving Sample Efficiency of Model-Free Algorithms for Zero-Sum Markov Games [66.2085181793014]
We show that a model-free stage-based Q-learning algorithm can enjoy the same optimality in the $H$ dependence as model-based algorithms. Our algorithm features a key novel design of updating the reference value functions as the pair of optimistic and pessimistic value functions.
arXiv Detail & Related papers (2023-08-17T08:34:58Z)
MA2QL: A Minimalist Approach to Fully Decentralized Multi-Agent Reinforcement Learning [63.46052494151171]
We propose textitmulti-agent alternate Q-learning (MA2QL), where agents take turns to update their Q-functions by Q-learning. We prove that when each agent guarantees a $varepsilon$-convergence at each turn, their joint policy converges to a Nash equilibrium. Results show MA2QL consistently outperforms IQL, which verifies the effectiveness of MA2QL, despite such minimal changes.
arXiv Detail & Related papers (2022-09-17T04:54:32Z)
V-Learning -- A Simple, Efficient, Decentralized Algorithm for Multiagent RL [35.304241088947116]
V-learning is a new class of single-agent RL algorithms that convert any adversarial bandit algorithm with suitable regret guarantees into a RL algorithm. Unlike Q-learning, it only maintains the estimates of V-values instead of Q-values.
arXiv Detail & Related papers (2021-10-27T16:25:55Z)
MMD-MIX: Value Function Factorisation with Maximum Mean Discrepancy for Cooperative Multi-Agent Reinforcement Learning [15.972363414919279]
MMD-mix is a method that combines distributional reinforcement learning and value decomposition. The experiments demonstrate that MMD-mix outperforms prior baselines in the Star Multi-Agent Challenge (SMAC) environment.
arXiv Detail & Related papers (2021-06-22T10:21:00Z)
Softmax with Regularization: Better Value Estimation in Multi-Agent Reinforcement Learning [72.28520951105207]
Overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning. We propose a novel regularization-based update scheme that penalizes large joint action-values deviating from a baseline. We show that our method provides a consistent performance improvement on a set of challenging StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2021-03-22T14:18:39Z)
SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms. SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z)
Weighted QMIX: Expanding Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning [66.94149388181343]
We present a new version of a popular $Q$-learning algorithm for MARL. We show that it can recover the optimal policy even with access to $Q*$. We also demonstrate improved performance on predator-prey and challenging multi-agent StarCraft benchmark tasks.
arXiv Detail & Related papers (2020-06-18T18:34:50Z)
Mean-Field Controls with Q-learning for Cooperative MARL: Convergence and Complexity Analysis [7.800126150380472]
This paper builds the mathematical framework to approximate cooperative MARL by a mean-field control (MFC) approach. It proposes a model-free kernel-based Q-learning algorithm (MFC-K-Q), which is shown to have a linear convergence rate for the MFC problem, the first of its kind in the MARL literature.
arXiv Detail & Related papers (2020-02-10T23:30:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.