Double Q-learning for Value-based Deep Reinforcement Learning, Revisited
- URL: http://arxiv.org/abs/2507.00275v1
- Date: Mon, 30 Jun 2025 21:32:46 GMT
- Title: Double Q-learning for Value-based Deep Reinforcement Learning, Revisited
- Authors: Prabhat Nagarajan, Martha White, Marlos C. Machado,
- Abstract summary: Double Q-learning is an algorithm introduced to address Q-learning's overestimation.<n>We study algorithms that adapt this core idea of Double Q-learning for value-based deep RL.
- Score: 24.236175931106978
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Overestimation is pervasive in reinforcement learning (RL), including in Q-learning, which forms the algorithmic basis for many value-based deep RL algorithms. Double Q-learning is an algorithm introduced to address Q-learning's overestimation by training two Q-functions and using both to de-correlate action-selection and action-evaluation in bootstrap targets. Shortly after Q-learning was adapted to deep RL in the form of deep Q-networks (DQN), Double Q-learning was adapted to deep RL in the form of Double DQN. However, Double DQN only loosely adapts Double Q-learning, forgoing the training of two different Q-functions that bootstrap off one another. In this paper, we study algorithms that adapt this core idea of Double Q-learning for value-based deep RL. We term such algorithms Deep Double Q-learning (DDQL). Our aim is to understand whether DDQL exhibits less overestimation than Double DQN and whether performant instantiations of DDQL exist. We answer both questions affirmatively, demonstrating that DDQL reduces overestimation and outperforms Double DQN in aggregate across 57 Atari 2600 games, without requiring additional hyperparameters. We also study several aspects of DDQL, including its network architecture, replay ratio, and minibatch sampling strategy.
Related papers
- DQ-LoRe: Dual Queries with Low Rank Approximation Re-ranking for
In-Context Learning [66.85379279041128]
In this study, we introduce a framework that leverages Dual Queries and Low-rank approximation Re-ranking to automatically select exemplars for in-context learning.
DQ-LoRe significantly outperforms prior state-of-the-art methods in the automatic selection of exemplars for GPT-4, enhancing performance from 92.5% to 94.2%.
arXiv Detail & Related papers (2023-10-04T16:44:37Z) - Quantum Imitation Learning [74.15588381240795]
We propose quantum imitation learning (QIL) with a hope to utilize quantum advantage to speed up IL.
We develop two QIL algorithms, quantum behavioural cloning (Q-BC) and quantum generative adversarial imitation learning (Q-GAIL)
Experiment results demonstrate that both Q-BC and Q-GAIL can achieve comparable performance compared to classical counterparts.
arXiv Detail & Related papers (2023-04-04T12:47:35Z) - M$^2$DQN: A Robust Method for Accelerating Deep Q-learning Network [6.689964384669018]
We propose a framework which uses the Max-Mean loss in Deep Q-Network (M$2$DQN)
Instead of sampling one batch of experiences in the training step, we sample several batches from the experience replay and update the parameters such as the maximum TD-error of these batches is minimized.
We verify the effectiveness of this framework with one of the most widely used techniques, Double DQN (DDQN) in several gym games.
arXiv Detail & Related papers (2022-09-16T09:20:35Z) - Online Target Q-learning with Reverse Experience Replay: Efficiently
finding the Optimal Policy for Linear MDPs [50.75812033462294]
We bridge the gap between practical success of Q-learning and pessimistic theoretical results.
We present novel methods Q-Rex and Q-RexDaRe.
We show that Q-Rex efficiently finds the optimal policy for linear MDPs.
arXiv Detail & Related papers (2021-10-16T01:47:41Z) - Expert Q-learning: Deep Reinforcement Learning with Coarse State Values from Offline Expert Examples [8.938418994111716]
Expert Q-learning is inspired by Dueling Q-learning and aims at incorporating semi-supervised learning into reinforcement learning.
An offline expert assesses the value of a state in a coarse manner using three discrete values.
Our results show that Expert Q-learning is indeed useful and more resistant to the overestimation bias.
arXiv Detail & Related papers (2021-06-28T12:41:45Z) - Self-correcting Q-Learning [14.178899938667161]
We introduce a new way to address the bias in the form of a "self-correcting algorithm"
Applying this strategy to Q-learning results in Self-correcting Q-learning.
We show theoretically that this new algorithm enjoys the same convergence guarantees as Q-learning while being more accurate.
arXiv Detail & Related papers (2020-12-02T11:36:24Z) - Deep Q-Network Based Multi-agent Reinforcement Learning with Binary
Action Agents [1.8782750537161614]
Deep Q-Network (DQN) based multi-agent systems (MAS) for reinforcement learning (RL) use various schemes where in the agents have to learn and communicate.
We propose a simple but efficient DQN based MAS for RL which uses shared state and rewards, but agent-specific actions.
The benefits of the approach are overall simplicity, faster convergence and better performance as compared to conventional DQN based approaches.
arXiv Detail & Related papers (2020-08-06T15:16:05Z) - Analysis of Q-learning with Adaptation and Momentum Restart for Gradient
Descent [47.3692506462581]
We first characterize the convergence rate for Q-AMSGrad, which is the Q-learning algorithm with AMSGrad update.
To further improve the performance, we propose to incorporate the momentum restart scheme to Q-AMSGrad, resulting in the so-called Q-AMSGradR algorithm.
Our experiments on a linear quadratic regulator problem show that the two proposed Q-learning algorithms outperform the vanilla Q-learning with SGD updates.
arXiv Detail & Related papers (2020-07-15T01:11:43Z) - SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms.
SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z) - Harvesting and Refining Question-Answer Pairs for Unsupervised QA [95.9105154311491]
We introduce two approaches to improve unsupervised Question Answering (QA)
First, we harvest lexically and syntactically divergent questions from Wikipedia to automatically construct a corpus of question-answer pairs (named as RefQA)
Second, we take advantage of the QA model to extract more appropriate answers, which iteratively refines data over RefQA.
arXiv Detail & Related papers (2020-05-06T15:56:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.