Elastic Step DQN: A novel multi-step algorithm to alleviate
overestimation in Deep QNetworks
- URL: http://arxiv.org/abs/2210.03325v1
- Date: Fri, 7 Oct 2022 04:56:04 GMT
- Title: Elastic Step DQN: A novel multi-step algorithm to alleviate
overestimation in Deep QNetworks
- Authors: Adrian Ly, Richard Dazeley, Peter Vamplew, Francisco Cruz and Sunil
Aryal
- Abstract summary: Deep Q-Networks algorithm (DQN) was the first reinforcement learning algorithm using deep neural network to surpass human level performance in a number of Atari learning environments.
The unstable behaviour is often characterised by overestimation in the $Q$-values, commonly referred to as the overestimation bias.
This paper proposes a new algorithm that dynamically varies the step size horizon in multi-step updates based on the similarity of states visited.
- Score: 2.781147009075454
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Q-Networks algorithm (DQN) was the first reinforcement learning
algorithm using deep neural network to successfully surpass human level
performance in a number of Atari learning environments. However, divergent and
unstable behaviour have been long standing issues in DQNs. The unstable
behaviour is often characterised by overestimation in the $Q$-values, commonly
referred to as the overestimation bias. To address the overestimation bias and
the divergent behaviour, a number of heuristic extensions have been proposed.
Notably, multi-step updates have been shown to drastically reduce unstable
behaviour while improving agent's training performance. However, agents are
often highly sensitive to the selection of the multi-step update horizon ($n$),
and our empirical experiments show that a poorly chosen static value for $n$
can in many cases lead to worse performance than single-step DQN. Inspired by
the success of $n$-step DQN and the effects that multi-step updates have on
overestimation bias, this paper proposes a new algorithm that we call `Elastic
Step DQN' (ES-DQN). It dynamically varies the step size horizon in multi-step
updates based on the similarity of states visited. Our empirical evaluation
shows that ES-DQN out-performs $n$-step with fixed $n$ updates, Double DQN and
Average DQN in several OpenAI Gym environments while at the same time
alleviating the overestimation bias.
Related papers
- A Perspective of Q-value Estimation on Offline-to-Online Reinforcement
Learning [54.48409201256968]
offline-to-online Reinforcement Learning (O2O RL) aims to improve the performance of offline pretrained policy using only a few online samples.
Most O2O methods focus on the balance between RL objective and pessimism, or the utilization of offline and online samples.
arXiv Detail & Related papers (2023-12-12T19:24:35Z) - Rethinking PGD Attack: Is Sign Function Necessary? [131.6894310945647]
We present a theoretical analysis of how such sign-based update algorithm influences step-wise attack performance.
We propose a new raw gradient descent (RGD) algorithm that eliminates the use of sign.
The effectiveness of the proposed RGD algorithm has been demonstrated extensively in experiments.
arXiv Detail & Related papers (2023-12-03T02:26:58Z) - Weakly Coupled Deep Q-Networks [5.76924666595801]
We propose a novel deep reinforcement learning algorithm that enhances performance in weakly coupled Markov decision processes (WCMDP)
WCDQN employs a single network to train multiple DQN "subagents", one for each subproblem, and then combine their solutions to establish an upper bound on the optimal action value.
arXiv Detail & Related papers (2023-10-28T20:07:57Z) - On the Convergence and Sample Complexity Analysis of Deep Q-Networks
with $\epsilon$-Greedy Exploration [86.71396285956044]
This paper provides a theoretical understanding of Deep Q-Network (DQN) with the $varepsilon$-greedy exploration in deep reinforcement learning.
arXiv Detail & Related papers (2023-10-24T20:37:02Z) - Understanding, Predicting and Better Resolving Q-Value Divergence in
Offline-RL [86.0987896274354]
We first identify a fundamental pattern, self-excitation, as the primary cause of Q-value estimation divergence in offline RL.
We then propose a novel Self-Excite Eigenvalue Measure (SEEM) metric to measure the evolving property of Q-network at training.
For the first time, our theory can reliably decide whether the training will diverge at an early stage.
arXiv Detail & Related papers (2023-10-06T17:57:44Z) - Sampling Efficient Deep Reinforcement Learning through Preference-Guided
Stochastic Exploration [8.612437964299414]
We propose a preference-guided $epsilon$-greedy exploration algorithm for Deep Q-network (DQN)
We show that preference-guided exploration motivates the DQN agent to take diverse actions, i.e., actions with larger Q-values can be sampled more frequently whereas actions with smaller Q-values still have a chance to be explored, thus encouraging the exploration.
arXiv Detail & Related papers (2022-06-20T08:23:49Z) - Temporal-Difference Value Estimation via Uncertainty-Guided Soft Updates [110.92598350897192]
Q-Learning has proven effective at learning a policy to perform control tasks.
estimation noise becomes a bias after the max operator in the policy improvement step.
We present Unbiased Soft Q-Learning (UQL), which extends the work of EQL from two action, finite state spaces to multi-action, infinite state Markov Decision Processes.
arXiv Detail & Related papers (2021-10-28T00:07:19Z) - A Convergent and Efficient Deep Q Network Algorithm [3.553493344868414]
We show that the deep Q network (DQN) reinforcement learning algorithm can diverge and cease to operate in realistic settings.
We propose a convergent DQN algorithm (C-DQN) by carefully modifying DQN.
It learns robustly in difficult settings and can learn several difficult games in the Atari 2600 benchmark.
arXiv Detail & Related papers (2021-06-29T13:38:59Z) - Self-correcting Q-Learning [14.178899938667161]
We introduce a new way to address the bias in the form of a "self-correcting algorithm"
Applying this strategy to Q-learning results in Self-correcting Q-learning.
We show theoretically that this new algorithm enjoys the same convergence guarantees as Q-learning while being more accurate.
arXiv Detail & Related papers (2020-12-02T11:36:24Z) - Variance Reduction for Deep Q-Learning using Stochastic Recursive
Gradient [51.880464915253924]
Deep Q-learning algorithms often suffer from poor gradient estimations with an excessive variance.
This paper introduces the framework for updating the gradient estimates in deep Q-learning, achieving a novel algorithm called SRG-DQN.
arXiv Detail & Related papers (2020-07-25T00:54:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.