Related papers: On the Reduction of Variance and Overestimation of Deep Q-Learning

On the Reduction of Variance and Overestimation of Deep Q-Learning

URL: http://arxiv.org/abs/1910.05983v2
Date: Sun, 14 Apr 2024 14:30:12 GMT
Title: On the Reduction of Variance and Overestimation of Deep Q-Learning
Authors: Mohammed Sabry, Amr M. A. Khalifa,
Abstract summary: We propose using Dropout techniques on deep Q-Learning algorithm as a way to reduce variance and overestimation. We also present experiments conducted on benchmark environments, demonstrating the effectiveness of our methodology in enhancing stability and reducing both variance and overestimation in model performance.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The breakthrough of deep Q-Learning on different types of environments revolutionized the algorithmic design of Reinforcement Learning to introduce more stable and robust algorithms, to that end many extensions to deep Q-Learning algorithm have been proposed to reduce the variance of the target values and the overestimation phenomena. In this paper, we examine new methodology to solve these issues, we propose using Dropout techniques on deep Q-Learning algorithm as a way to reduce variance and overestimation. We also present experiments conducted on benchmark environments, demonstrating the effectiveness of our methodology in enhancing stability and reducing both variance and overestimation in model performance.

Related papers

Risk-sensitive Reinforcement Learning Based on Convex Scoring Functions [8.758206783988404]
We propose a reinforcement learning framework under a broad class of risk objectives, characterized by convex scoring functions.<n>This class covers many common risk measures, such as variance, Expected Shortfall, entropic Value-at-Risk, and mean-risk utility.<n>We validate our approach in simulation experiments with a financial application in statistical arbitrage trading, demonstrating the effectiveness of the algorithm.
arXiv Detail & Related papers (2025-05-07T16:31:42Z)
An Empirical Investigation of Value-Based Multi-objective Reinforcement Learning for Stochastic Environments [1.26404863283601]
This paper examines the factors influencing the frequency with which value-based MORL Q-learning algorithms learn the SER-optimal policy. We highlight the critical impact of the noisy Q-value estimates issue on the stability and convergence of these algorithms.
arXiv Detail & Related papers (2024-01-06T08:43:08Z)
Uncertainty Estimation by Fisher Information-based Evidential Deep Learning [61.94125052118442]
Uncertainty estimation is a key factor that makes deep learning reliable in practical applications. We propose a novel method, Fisher Information-based Evidential Deep Learning ($mathcalI$-EDL) In particular, we introduce Fisher Information Matrix (FIM) to measure the informativeness of evidence carried by each sample, according to which we can dynamically reweight the objective loss terms to make the network more focused on the representation learning of uncertain classes.
arXiv Detail & Related papers (2023-03-03T16:12:59Z)
Improving robustness of jet tagging algorithms with adversarial training [56.79800815519762]
We investigate the vulnerability of flavor tagging algorithms via application of adversarial attacks. We present an adversarial training strategy that mitigates the impact of such simulated attacks.
arXiv Detail & Related papers (2022-03-25T19:57:19Z)
Calibrated and Sharp Uncertainties in Deep Learning via Simple Density Estimation [7.184701179854522]
This paper argues for reasoning about uncertainty in terms these properties and proposes simple algorithms for enforcing them in deep learning. Our methods focus on the strongest notion of calibration--distribution calibration--and enforce it by fitting a low-dimensional density or quantile function with a neural estimator. Empirically, we find that our methods improve predictive uncertainties on several tasks with minimal computational and implementation overhead.
arXiv Detail & Related papers (2021-12-14T06:19:05Z)
A Regret Minimization Approach to Iterative Learning Control [61.37088759497583]
We propose a new performance metric, planning regret, which replaces the standard uncertainty assumptions with worst case regret. We provide theoretical and empirical evidence that the proposed algorithm outperforms existing methods on several benchmarks.
arXiv Detail & Related papers (2021-02-26T13:48:49Z)
Cross Learning in Deep Q-Networks [82.20059754270302]
We propose a novel cross Q-learning algorithm, aim at alleviating the well-known overestimation problem in value-based reinforcement learning methods. Our algorithm builds on double Q-learning, by maintaining a set of parallel models and estimate the Q-value based on a randomly selected network.
arXiv Detail & Related papers (2020-09-29T04:58:17Z)
Variance Reduction for Deep Q-Learning using Stochastic Recursive Gradient [51.880464915253924]
Deep Q-learning algorithms often suffer from poor gradient estimations with an excessive variance. This paper introduces the framework for updating the gradient estimates in deep Q-learning, achieving a novel algorithm called SRG-DQN.
arXiv Detail & Related papers (2020-07-25T00:54:20Z)
Reparameterized Variational Divergence Minimization for Stable Imitation [57.06909373038396]
We study the extent to which variations in the choice of probabilistic divergence may yield more performant ILO algorithms. We contribute a re parameterization trick for adversarial imitation learning to alleviate the challenges of the promising $f$-divergence minimization framework. Empirically, we demonstrate that our design choices allow for ILO algorithms that outperform baseline approaches and more closely match expert performance in low-dimensional continuous-control tasks.
arXiv Detail & Related papers (2020-06-18T19:04:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.