Abstract: In recent years, multi-agent deep reinforcement learning has been
successfully applied to various complicated scenarios such as computer games
and robot swarms. We thoroughly study and compare the state-of-the-art
cooperative multi-agent deep reinforcement learning algorithms. Specifically,
we investigate the consequences of the "hyperparameter tricks" of QMIX and its
improved variants. Our results show that: (1) The significant performance
improvements of these variant algorithms come from hyperparameter-level
optimizations in their open-source codes (2) After modest tuning and with no
changes to the network architecture, QMIX can attain extraordinarily high win
rates in all hard and super hard scenarios of StarCraft Multi-Agent Challenge
(SMAC) and achieve state-of-the-art (SOTA). In this work, we proposed a
reliable QMIX benchmark, which will be of great benefit to subsequent research.
Besides, we proposed a hypothesis to explain the excellent performance of QMIX.