Elastic Step DQN: A novel multi-step algorithm to alleviate
overestimation in Deep QNetworks
- URL: http://arxiv.org/abs/2210.03325v1
- Date: Fri, 7 Oct 2022 04:56:04 GMT
- Title: Elastic Step DQN: A novel multi-step algorithm to alleviate
overestimation in Deep QNetworks
- Authors: Adrian Ly, Richard Dazeley, Peter Vamplew, Francisco Cruz and Sunil
Aryal
- Abstract summary: Deep Q-Networks algorithm (DQN) was the first reinforcement learning algorithm using deep neural network to surpass human level performance in a number of Atari learning environments.
The unstable behaviour is often characterised by overestimation in the $Q$-values, commonly referred to as the overestimation bias.
This paper proposes a new algorithm that dynamically varies the step size horizon in multi-step updates based on the similarity of states visited.
- Score: 2.781147009075454
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Q-Networks algorithm (DQN) was the first reinforcement learning
algorithm using deep neural network to successfully surpass human level
performance in a number of Atari learning environments. However, divergent and
unstable behaviour have been long standing issues in DQNs. The unstable
behaviour is often characterised by overestimation in the $Q$-values, commonly
referred to as the overestimation bias. To address the overestimation bias and
the divergent behaviour, a number of heuristic extensions have been proposed.
Notably, multi-step updates have been shown to drastically reduce unstable
behaviour while improving agent's training performance. However, agents are
often highly sensitive to the selection of the multi-step update horizon ($n$),
and our empirical experiments show that a poorly chosen static value for $n$
can in many cases lead to worse performance than single-step DQN. Inspired by
the success of $n$-step DQN and the effects that multi-step updates have on
overestimation bias, this paper proposes a new algorithm that we call `Elastic
Step DQN' (ES-DQN). It dynamically varies the step size horizon in multi-step
updates based on the similarity of states visited. Our empirical evaluation
shows that ES-DQN out-performs $n$-step with fixed $n$ updates, Double DQN and
Average DQN in several OpenAI Gym environments while at the same time
alleviating the overestimation bias.
Related papers
- Provably Efficient and Agile Randomized Q-Learning [35.14581235983678]
We propose a novel variant of Q-learning algorithm, refereed to as RandomizedQ, which integrates sampling-based exploration with agile, step-wise, policy updates.<n> Empirically, RandomizedQ exhibits outstanding performance compared to existing Q-learning variants with both bonus-based and Bayesian-based exploration on standard benchmarks.
arXiv Detail & Related papers (2025-06-30T16:08:29Z) - Ensemble Elastic DQN: A novel multi-step ensemble approach to address overestimation in deep value-based reinforcement learning [1.8008841825105586]
We introduce a novel algorithm called Ensemble Elastic Step DQN (EEDQN), which unifies ensembles with elastic step updates to stabilise algorithmic performance.<n>EEDQN is designed to address two major challenges in deep reinforcement learning: overestimation bias and sample efficiency.<n>Our results show that EEDQN achieves consistently robust performance across all tested environments.
arXiv Detail & Related papers (2025-06-06T03:36:19Z) - From Continual Learning to SGD and Back: Better Rates for Continual Linear Models [50.11453013647086]
We analyze the forgetting, i.e., loss on previously seen tasks, after $k$ iterations.<n>We develop novel last-iterate upper bounds in the realizable least squares setup.<n>We prove for the first time that randomization alone, with no task repetition, can prevent catastrophic in sufficiently long task sequences.
arXiv Detail & Related papers (2025-04-06T18:39:45Z) - A Perspective of Q-value Estimation on Offline-to-Online Reinforcement
Learning [54.48409201256968]
offline-to-online Reinforcement Learning (O2O RL) aims to improve the performance of offline pretrained policy using only a few online samples.
Most O2O methods focus on the balance between RL objective and pessimism, or the utilization of offline and online samples.
arXiv Detail & Related papers (2023-12-12T19:24:35Z) - Rethinking PGD Attack: Is Sign Function Necessary? [131.6894310945647]
We present a theoretical analysis of how such sign-based update algorithm influences step-wise attack performance.
We propose a new raw gradient descent (RGD) algorithm that eliminates the use of sign.
The effectiveness of the proposed RGD algorithm has been demonstrated extensively in experiments.
arXiv Detail & Related papers (2023-12-03T02:26:58Z) - Weakly Coupled Deep Q-Networks [5.76924666595801]
We propose a novel deep reinforcement learning algorithm that enhances performance in weakly coupled Markov decision processes (WCMDP)
WCDQN employs a single network to train multiple DQN "subagents", one for each subproblem, and then combine their solutions to establish an upper bound on the optimal action value.
arXiv Detail & Related papers (2023-10-28T20:07:57Z) - On the Convergence and Sample Complexity Analysis of Deep Q-Networks
with $\epsilon$-Greedy Exploration [86.71396285956044]
This paper provides a theoretical understanding of Deep Q-Network (DQN) with the $varepsilon$-greedy exploration in deep reinforcement learning.
arXiv Detail & Related papers (2023-10-24T20:37:02Z) - Understanding, Predicting and Better Resolving Q-Value Divergence in
Offline-RL [86.0987896274354]
We first identify a fundamental pattern, self-excitation, as the primary cause of Q-value estimation divergence in offline RL.
We then propose a novel Self-Excite Eigenvalue Measure (SEEM) metric to measure the evolving property of Q-network at training.
For the first time, our theory can reliably decide whether the training will diverge at an early stage.
arXiv Detail & Related papers (2023-10-06T17:57:44Z) - Does DQN Learn? [16.035744751431114]
We numerically show that Deep Q-Network (DQN) often returns a policy that performs worse than the initial one.<n>We offer a theoretical explanation for this phenomenon in linear DQN.
arXiv Detail & Related papers (2022-05-26T20:46:01Z) - Temporal-Difference Value Estimation via Uncertainty-Guided Soft Updates [110.92598350897192]
Q-Learning has proven effective at learning a policy to perform control tasks.
estimation noise becomes a bias after the max operator in the policy improvement step.
We present Unbiased Soft Q-Learning (UQL), which extends the work of EQL from two action, finite state spaces to multi-action, infinite state Markov Decision Processes.
arXiv Detail & Related papers (2021-10-28T00:07:19Z) - Modified Double DQN: addressing stability [0.2867517731896504]
The Double-DQN (DDQN) algorithm was originally proposed to address the overestimation issue in the original DQN algorithm.
Three modifications to the DDQN algorithm are proposed with the hope of maintaining the performance in the terms of both stability and overestimation.
arXiv Detail & Related papers (2021-08-09T15:27:22Z) - A Convergent and Efficient Deep Q Network Algorithm [3.553493344868414]
We show that the deep Q network (DQN) reinforcement learning algorithm can diverge and cease to operate in realistic settings.
We propose a convergent DQN algorithm (C-DQN) by carefully modifying DQN.
It learns robustly in difficult settings and can learn several difficult games in the Atari 2600 benchmark.
arXiv Detail & Related papers (2021-06-29T13:38:59Z) - Self-correcting Q-Learning [14.178899938667161]
We introduce a new way to address the bias in the form of a "self-correcting algorithm"
Applying this strategy to Q-learning results in Self-correcting Q-learning.
We show theoretically that this new algorithm enjoys the same convergence guarantees as Q-learning while being more accurate.
arXiv Detail & Related papers (2020-12-02T11:36:24Z) - Variance Reduction for Deep Q-Learning using Stochastic Recursive
Gradient [51.880464915253924]
Deep Q-learning algorithms often suffer from poor gradient estimations with an excessive variance.
This paper introduces the framework for updating the gradient estimates in deep Q-learning, achieving a novel algorithm called SRG-DQN.
arXiv Detail & Related papers (2020-07-25T00:54:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.