Dual Ensembled Multiagent Q-Learning with Hypernet Regularizer
- URL: http://arxiv.org/abs/2502.02018v1
- Date: Tue, 04 Feb 2025 05:14:58 GMT
- Title: Dual Ensembled Multiagent Q-Learning with Hypernet Regularizer
- Authors: Yaodong Yang, Guangyong Chen, Hongyao Tang, Furui Liu, Danruo Deng, Pheng Ann Heng,
- Abstract summary: Overestimation in the multiagent setting has received comparatively little attention.
We propose a novel hypernet regularizer on hypernetwork weights and biases to constrain the optimization of online global Q-network to prevent overestimation accumulation.
- Score: 62.01554688056335
- License:
- Abstract: Overestimation in single-agent reinforcement learning has been extensively studied. In contrast, overestimation in the multiagent setting has received comparatively little attention although it increases with the number of agents and leads to severe learning instability. Previous works concentrate on reducing overestimation in the estimation process of target Q-value. They ignore the follow-up optimization process of online Q-network, thus making it hard to fully address the complex multiagent overestimation problem. To solve this challenge, in this study, we first establish an iterative estimation-optimization analysis framework for multiagent value-mixing Q-learning. Our analysis reveals that multiagent overestimation not only comes from the computation of target Q-value but also accumulates in the online Q-network's optimization. Motivated by it, we propose the Dual Ensembled Multiagent Q-Learning with Hypernet Regularizer algorithm to tackle multiagent overestimation from two aspects. First, we extend the random ensemble technique into the estimation of target individual and global Q-values to derive a lower update target. Second, we propose a novel hypernet regularizer on hypernetwork weights and biases to constrain the optimization of online global Q-network to prevent overestimation accumulation. Extensive experiments in MPE and SMAC show that the proposed method successfully addresses overestimation across various tasks.
Related papers
- Multi-agent Reinforcement Learning with Deep Networks for Diverse Q-Vectors [3.9801926395657325]
This paper proposes a deep Q-networks (DQN) algorithm capable of learning various Q-vectors using Max, Nash, and Maximin strategies.
The effectiveness of this approach is demonstrated in an environment where dual robotic arms collaborate to lift a pot.
arXiv Detail & Related papers (2024-06-12T03:30:10Z) - Inverse Factorized Q-Learning for Cooperative Multi-agent Imitation
Learning [13.060023718506917]
imitation learning (IL) is a problem of learning to mimic expert behaviors from demonstrations in cooperative multi-agent systems.
We introduce a novel multi-agent IL algorithm designed to address these challenges.
Our approach enables the centralized learning by leveraging mixing networks to aggregate decentralized Q functions.
arXiv Detail & Related papers (2023-10-10T17:11:20Z) - Understanding, Predicting and Better Resolving Q-Value Divergence in
Offline-RL [86.0987896274354]
We first identify a fundamental pattern, self-excitation, as the primary cause of Q-value estimation divergence in offline RL.
We then propose a novel Self-Excite Eigenvalue Measure (SEEM) metric to measure the evolving property of Q-network at training.
For the first time, our theory can reliably decide whether the training will diverge at an early stage.
arXiv Detail & Related papers (2023-10-06T17:57:44Z) - DQMIX: A Distributional Perspective on Multi-Agent Reinforcement
Learning [122.47938710284784]
In cooperative multi-agent tasks, a team of agents jointly interact with an environment by taking actions, receiving a reward and observing the next state.
Most of the existing value-based multi-agent reinforcement learning methods only model the expectations of individual Q-values and global Q-value.
arXiv Detail & Related papers (2022-02-21T11:28:00Z) - Softmax with Regularization: Better Value Estimation in Multi-Agent
Reinforcement Learning [72.28520951105207]
Overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning.
We propose a novel regularization-based update scheme that penalizes large joint action-values deviating from a baseline.
We show that our method provides a consistent performance improvement on a set of challenging StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2021-03-22T14:18:39Z) - Cross Learning in Deep Q-Networks [82.20059754270302]
We propose a novel cross Q-learning algorithm, aim at alleviating the well-known overestimation problem in value-based reinforcement learning methods.
Our algorithm builds on double Q-learning, by maintaining a set of parallel models and estimate the Q-value based on a randomly selected network.
arXiv Detail & Related papers (2020-09-29T04:58:17Z) - QTRAN++: Improved Value Transformation for Cooperative Multi-Agent
Reinforcement Learning [70.382101956278]
QTRAN is a reinforcement learning algorithm capable of learning the largest class of joint-action value functions.
Despite its strong theoretical guarantee, it has shown poor empirical performance in complex environments.
We propose a substantially improved version, coined QTRAN++.
arXiv Detail & Related papers (2020-06-22T05:08:36Z) - Multi-Agent Determinantal Q-Learning [39.79718674655209]
We propose multi-agent determinantal Q-learning. Q-DPP promotes agents to acquire diverse behavioral models.
We demonstrate that Q-DPP generalizes major solutions including VDN, QMIX, and QTRAN on decentralizable cooperative tasks.
arXiv Detail & Related papers (2020-06-02T09:32:48Z) - Towards Understanding Cooperative Multi-Agent Q-Learning with Value
Factorization [28.89692989420673]
We formalize a multi-agent fitted Q-iteration framework for analyzing factorized multi-agent Q-learning.
Through further analysis, we find that on-policy training or richer joint value function classes can improve its local or global convergence properties.
arXiv Detail & Related papers (2020-05-31T19:14:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.