Addressing Maximization Bias in Reinforcement Learning with Two-Sample
Testing
- URL: http://arxiv.org/abs/2201.08078v3
- Date: Wed, 18 Oct 2023 11:39:07 GMT
- Title: Addressing Maximization Bias in Reinforcement Learning with Two-Sample
Testing
- Authors: Martin Waltz and Ostap Okhrin
- Abstract summary: Overestimation bias is a known threat to value-based reinforcement-learning algorithms.
We propose an estimator that flexibly interpolates between over- and underestimation by adjusting the significance level of the underlying hypothesis tests.
A generalization, termed $K$-Estimator (KE), obeys the same bias and variance bounds as the TE while relying on a nearly arbitrary kernel function.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Value-based reinforcement-learning algorithms have shown strong results in
games, robotics, and other real-world applications. Overestimation bias is a
known threat to those algorithms and can lead to dramatic performance decreases
or even complete algorithmic failure. We frame the bias problem statistically
and consider it an instance of estimating the maximum expected value (MEV) of a
set of random variables. We propose the $T$-Estimator (TE) based on two-sample
testing for the mean, that flexibly interpolates between over- and
underestimation by adjusting the significance level of the underlying
hypothesis tests. A generalization, termed $K$-Estimator (KE), obeys the same
bias and variance bounds as the TE while relying on a nearly arbitrary kernel
function. We introduce modifications of $Q$-Learning and the Bootstrapped Deep
$Q$-Network (BDQN) using the TE and the KE, and prove convergence in the
tabular setting. Furthermore, we propose an adaptive variant of the TE-based
BDQN that dynamically adjusts the significance level to minimize the absolute
estimation bias. All proposed estimators and algorithms are thoroughly tested
and validated on diverse tasks and environments, illustrating the bias control
and performance potential of the TE and KE.
Related papers
- Regularized Q-learning through Robust Averaging [3.4354636842203026]
We propose a new Q-learning variant, called 2RA Q-learning, that addresses some weaknesses of existing Q-learning methods in a principled manner.
One such weakness is an underlying estimation bias which cannot be controlled and often results in poor performance.
We show that 2RA Q-learning converges to the optimal policy and analyze its theoretical mean-squared error.
arXiv Detail & Related papers (2024-05-03T15:57:26Z) - Echoes of Socratic Doubt: Embracing Uncertainty in Calibrated Evidential Reinforcement Learning [1.7898305876314982]
The proposed algorithm combines deep evidential learning with quantile calibration based on principles of conformal inference.
It is tested on a suite of miniaturized Atari games (i.e., MinAtar)
arXiv Detail & Related papers (2024-02-11T05:17:56Z) - Beyond Expectations: Learning with Stochastic Dominance Made Practical [88.06211893690964]
dominance models risk-averse preferences for decision making with uncertain outcomes.
Despite theoretically appealing, the application of dominance in machine learning has been scarce.
We first generalize the dominance concept to enable feasible comparisons between any arbitrary pair of random variables.
We then develop a simple and efficient approach for finding the optimal solution in terms of dominance.
arXiv Detail & Related papers (2024-02-05T03:21:23Z) - A powerful rank-based correction to multiple testing under positive
dependency [48.098218835606055]
We develop a novel multiple hypothesis testing correction with family-wise error rate (FWER) control.
Our proposed algorithm $textttmax-rank$ is conceptually straight-forward, relying on the use of a $max$-operator in the rank domain of computed test statistics.
arXiv Detail & Related papers (2023-11-17T22:44:22Z) - Delving into Identify-Emphasize Paradigm for Combating Unknown Bias [52.76758938921129]
We propose an effective bias-conflicting scoring method (ECS) to boost the identification accuracy.
We also propose gradient alignment (GA) to balance the contributions of the mined bias-aligned and bias-conflicting samples.
Experiments are conducted on multiple datasets in various settings, demonstrating that the proposed solution can mitigate the impact of unknown biases.
arXiv Detail & Related papers (2023-02-22T14:50:24Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Estimation Error Correction in Deep Reinforcement Learning for
Deterministic Actor-Critic Methods [0.0]
In value-based deep reinforcement learning methods, approximation of value functions induces overestimation bias and leads to suboptimal policies.
We show that in deep actor-critic methods that aim to overcome the overestimation bias, if the reinforcement signals received by the agent have a high variance, a significant underestimation bias arises.
To minimize the underestimation, we introduce a parameter-free, novel deep Q-learning variant.
arXiv Detail & Related papers (2021-09-22T13:49:35Z) - Instance-optimality in optimal value estimation: Adaptivity via
variance-reduced Q-learning [99.34907092347733]
We analyze the problem of estimating optimal $Q$-value functions for a discounted Markov decision process with discrete states and actions.
Using a local minimax framework, we show that this functional arises in lower bounds on the accuracy on any estimation procedure.
In the other direction, we establish the sharpness of our lower bounds, up to factors logarithmic in the state and action spaces, by analyzing a variance-reduced version of $Q$-learning.
arXiv Detail & Related papers (2021-06-28T00:38:54Z) - Understanding the Under-Coverage Bias in Uncertainty Estimation [58.03725169462616]
quantile regression tends to emphunder-cover than the desired coverage level in reality.
We prove that quantile regression suffers from an inherent under-coverage bias.
Our theory reveals that this under-coverage bias stems from a certain high-dimensional parameter estimation error.
arXiv Detail & Related papers (2021-06-10T06:11:55Z) - Maxmin Q-learning: Controlling the Estimation Bias of Q-learning [31.742397178618624]
Overestimation bias affects Q-learning because it approximates the maximum action value using the maximum estimated action value.
We propose a generalization of Q-learning, called emphMaxmin Q-learning, which provides a parameter to flexibly control bias.
We empirically verify that our algorithm better controls estimation bias in toy environments, and that it achieves superior performance on several benchmark problems.
arXiv Detail & Related papers (2020-02-16T02:02:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.