Temporal Difference Uncertainties as a Signal for Exploration
- URL: http://arxiv.org/abs/2010.02255v2
- Date: Thu, 1 Jul 2021 09:21:25 GMT
- Title: Temporal Difference Uncertainties as a Signal for Exploration
- Authors: Sebastian Flennerhag, Jane X. Wang, Pablo Sprechmann, Francesco Visin,
Alexandre Galashov, Steven Kapturowski, Diana L. Borsa, Nicolas Heess, Andre
Barreto, Razvan Pascanu
- Abstract summary: An effective approach to exploration in reinforcement learning is to rely on an agent's uncertainty over the optimal policy.
In this paper, we highlight that value estimates are easily biased and temporally inconsistent.
We propose a novel method for estimating uncertainty over the value function that relies on inducing a distribution over temporal difference errors.
- Score: 76.6341354269013
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: An effective approach to exploration in reinforcement learning is to rely on
an agent's uncertainty over the optimal policy, which can yield near-optimal
exploration strategies in tabular settings. However, in non-tabular settings
that involve function approximators, obtaining accurate uncertainty estimates
is almost as challenging a problem. In this paper, we highlight that value
estimates are easily biased and temporally inconsistent. In light of this, we
propose a novel method for estimating uncertainty over the value function that
relies on inducing a distribution over temporal difference errors. This
exploration signal controls for state-action transitions so as to isolate
uncertainty in value that is due to uncertainty over the agent's parameters.
Because our measure of uncertainty conditions on state-action transitions, we
cannot act on this measure directly. Instead, we incorporate it as an intrinsic
reward and treat exploration as a separate learning problem, induced by the
agent's temporal difference uncertainties. We introduce a distinct exploration
policy that learns to collect data with high estimated uncertainty, which gives
rise to a curriculum that smoothly changes throughout learning and vanishes in
the limit of perfect value estimates. We evaluate our method on hard
exploration tasks, including Deep Sea and Atari 2600 environments and find that
our proposed form of exploration facilitates both diverse and deep exploration.
Related papers
- Uncertainty-boosted Robust Video Activity Anticipation [72.14155465769201]
Video activity anticipation aims to predict what will happen in the future, embracing a broad application prospect ranging from robot vision to autonomous driving.
Despite the recent progress, the data uncertainty issue, reflected as the content evolution process and dynamic correlation in event labels, has been somehow ignored.
We propose an uncertainty-boosted robust video activity anticipation framework, which generates uncertainty values to indicate the credibility of the anticipation results.
arXiv Detail & Related papers (2024-04-29T12:31:38Z) - Cost-Sensitive Uncertainty-Based Failure Recognition for Object Detection [1.8990839669542954]
We propose a cost-sensitive framework for object detection tailored to user-defined budgets.
We derive minimum thresholding requirements to prevent performance degradation.
We automate and optimize the thresholding process to maximize the failure recognition rate.
arXiv Detail & Related papers (2024-04-26T14:03:55Z) - One step closer to unbiased aleatoric uncertainty estimation [71.55174353766289]
We propose a new estimation method by actively de-noising the observed data.
By conducting a broad range of experiments, we demonstrate that our proposed approach provides a much closer approximation to the actual data uncertainty than the standard method.
arXiv Detail & Related papers (2023-12-16T14:59:11Z) - Wasserstein Actor-Critic: Directed Exploration via Optimism for
Continuous-Actions Control [41.7453231409493]
Wasserstein Actor-Critic ( WAC) is an actor-critic architecture inspired by the Wasserstein Q-Learning (WQL) citepwql.
WAC enforces exploration in a principled way by guiding the policy learning process with the optimization of an upper bound of the Q-value estimates.
arXiv Detail & Related papers (2023-03-04T10:52:20Z) - Model-Based Uncertainty in Value Functions [89.31922008981735]
We focus on characterizing the variance over values induced by a distribution over MDPs.
Previous work upper bounds the posterior variance over values by solving a so-called uncertainty Bellman equation.
We propose a new uncertainty Bellman equation whose solution converges to the true posterior variance over values.
arXiv Detail & Related papers (2023-02-24T09:18:27Z) - CertainNet: Sampling-free Uncertainty Estimation for Object Detection [65.28989536741658]
Estimating the uncertainty of a neural network plays a fundamental role in safety-critical settings.
In this work, we propose a novel sampling-free uncertainty estimation method for object detection.
We call it CertainNet, and it is the first to provide separate uncertainties for each output signal: objectness, class, location and size.
arXiv Detail & Related papers (2021-10-04T17:59:31Z) - ADER:Adapting between Exploration and Robustness for Actor-Critic
Methods [8.750251598581102]
We show that TD3's performance lags behind the vanilla actor-critic methods in some primitive environments.
We propose a novel algorithm toward this problem that ADapts between Exploration and Robustness, namely ADER.
Experiments in several challenging environments demonstrate the supremacy of the proposed method in continuous control tasks.
arXiv Detail & Related papers (2021-09-08T05:48:39Z) - Learning Uncertainty For Safety-Oriented Semantic Segmentation In
Autonomous Driving [77.39239190539871]
We show how uncertainty estimation can be leveraged to enable safety critical image segmentation in autonomous driving.
We introduce a new uncertainty measure based on disagreeing predictions as measured by a dissimilarity function.
We show experimentally that our proposed approach is much less computationally intensive at inference time than competing methods.
arXiv Detail & Related papers (2021-05-28T09:23:05Z) - Exploring Uncertainty in Deep Learning for Construction of Prediction
Intervals [27.569681578957645]
We explore the uncertainty in deep learning to construct prediction intervals.
We design a special loss function, which enables us to learn uncertainty without uncertainty label.
Our method correlates the construction of prediction intervals with the uncertainty estimation.
arXiv Detail & Related papers (2021-04-27T02:58:20Z) - Deep Learning based Uncertainty Decomposition for Real-time Control [9.067368638784355]
We propose a novel method for detecting the absence of training data using deep learning.
We show its advantages over existing approaches on synthetic and real-world datasets.
We further demonstrate the practicality of this uncertainty estimate in deploying online data-efficient control on a simulated quadcopter.
arXiv Detail & Related papers (2020-10-06T10:46:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.