QTRAN++: Improved Value Transformation for Cooperative Multi-Agent
Reinforcement Learning
- URL: http://arxiv.org/abs/2006.12010v2
- Date: Tue, 6 Oct 2020 01:50:57 GMT
- Title: QTRAN++: Improved Value Transformation for Cooperative Multi-Agent
Reinforcement Learning
- Authors: Kyunghwan Son, Sungsoo Ahn, Roben Delos Reyes, Jinwoo Shin, Yung Yi
- Abstract summary: QTRAN is a reinforcement learning algorithm capable of learning the largest class of joint-action value functions.
Despite its strong theoretical guarantee, it has shown poor empirical performance in complex environments.
We propose a substantially improved version, coined QTRAN++.
- Score: 70.382101956278
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: QTRAN is a multi-agent reinforcement learning (MARL) algorithm capable of
learning the largest class of joint-action value functions up to date. However,
despite its strong theoretical guarantee, it has shown poor empirical
performance in complex environments, such as Starcraft Multi-Agent Challenge
(SMAC). In this paper, we identify the performance bottleneck of QTRAN and
propose a substantially improved version, coined QTRAN++. Our gains come from
(i) stabilizing the training objective of QTRAN, (ii) removing the strict role
separation between the action-value estimators of QTRAN, and (iii) introducing
a multi-head mixing network for value transformation. Through extensive
evaluation, we confirm that our diagnosis is correct, and QTRAN++ successfully
bridges the gap between empirical performance and theoretical guarantee. In
particular, QTRAN++ newly achieves state-of-the-art performance in the SMAC
environment. The code will be released.
Related papers
- Inverse Factorized Q-Learning for Cooperative Multi-agent Imitation
Learning [13.060023718506917]
imitation learning (IL) is a problem of learning to mimic expert behaviors from demonstrations in cooperative multi-agent systems.
We introduce a novel multi-agent IL algorithm designed to address these challenges.
Our approach enables the centralized learning by leveraging mixing networks to aggregate decentralized Q functions.
arXiv Detail & Related papers (2023-10-10T17:11:20Z) - Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level
Stability and High-Level Behavior [51.60683890503293]
We propose a theoretical framework for studying behavior cloning of complex expert demonstrations using generative modeling.
We show that pure supervised cloning can generate trajectories matching the per-time step distribution of arbitrary expert trajectories.
arXiv Detail & Related papers (2023-07-27T04:27:26Z) - Greedy based Value Representation for Optimal Coordination in
Multi-agent Reinforcement Learning [64.05646120624287]
We derive the expression of the joint Q value function of LVD and MVD.
To ensure optimal consistency, the optimal node is required to be the unique STN.
Our method outperforms state-of-the-art baselines in experiments on various benchmarks.
arXiv Detail & Related papers (2022-11-22T08:14:50Z) - Solving Continuous Control via Q-learning [54.05120662838286]
We show that a simple modification of deep Q-learning largely alleviates issues with actor-critic methods.
By combining bang-bang action discretization with value decomposition, framing single-agent control as cooperative multi-agent reinforcement learning (MARL), this simple critic-only approach matches performance of state-of-the-art continuous actor-critic methods.
arXiv Detail & Related papers (2022-10-22T22:55:50Z) - Residual Q-Networks for Value Function Factorizing in Multi-Agent
Reinforcement Learning [0.0]
We propose a novel concept of Residual Q-Networks (RQNs) for Multi-Agent Reinforcement Learning (MARL)
The RQN learns to transform the individual Q-value trajectories in a way that preserves the Individual-Global-Max criteria (IGM)
The proposed method converges faster, with increased stability and shows robust performance in a wider family of environments.
arXiv Detail & Related papers (2022-05-30T16:56:06Z) - Simultaneous Double Q-learning with Conservative Advantage Learning for
Actor-Critic Methods [133.85604983925282]
We propose Simultaneous Double Q-learning with Conservative Advantage Learning (SDQ-CAL)
Our algorithm realizes less biased value estimation and achieves state-of-the-art performance in a range of continuous control benchmark tasks.
arXiv Detail & Related papers (2022-05-08T09:17:16Z) - Off-Policy Correction For Multi-Agent Reinforcement Learning [9.599347559588216]
Multi-agent reinforcement learning (MARL) provides a framework for problems involving multiple interacting agents.
Despite apparent similarity to the single-agent case, multi-agent problems are often harder to train and analyze theoretically.
We propose MA-Trace, a new on-policy actor-critic algorithm, which extends V-Trace to the MARL setting.
arXiv Detail & Related papers (2021-11-22T14:23:13Z) - Softmax with Regularization: Better Value Estimation in Multi-Agent
Reinforcement Learning [72.28520951105207]
Overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning.
We propose a novel regularization-based update scheme that penalizes large joint action-values deviating from a baseline.
We show that our method provides a consistent performance improvement on a set of challenging StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2021-03-22T14:18:39Z) - Hyperparameter Tricks in Multi-Agent Reinforcement Learning: An
Empirical Study [5.811502603310249]
We study and compare the state-of-the-art cooperative multi-agent deep reinforcement learning algorithms.
QMIX can attain extraordinarily high win rates in all hard and super hard scenarios of StarCraft Multi-Agent Challenge (SMAC) and achieve state-of-the-art (SOTA)
arXiv Detail & Related papers (2021-02-06T02:28:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.