Related papers: Deep Bandits Show-Off: Simple and Efficient Exploration with Deep Networks

Deep Bandits Show-Off: Simple and Efficient Exploration with Deep Networks

URL: http://arxiv.org/abs/2105.04683v1
Date: Mon, 10 May 2021 21:45:01 GMT
Title: Deep Bandits Show-Off: Simple and Efficient Exploration with Deep Networks
Authors: Mattia Rigotti, Rong Zhu
Abstract summary: We introduce Sample Average Uncertainty (SAU), a simple and efficient uncertainty measure for contextual bandits. Because of its simplicity SAU can be seamlessly applied to deep contextual bandits as a very scalable drop-in replacement for epsilon-greedy exploration.
Score: 14.178899938667161
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Designing efficient exploration is central to Reinforcement Learning due to the fundamental problem posed by the exploration-exploitation dilemma. Bayesian exploration strategies like Thompson Sampling resolve this trade-off in a principled way by modeling and updating the distribution of the parameters of the the action-value function, the outcome model of the environment. However, this technique becomes infeasible for complex environments due to the difficulty of representing and updating probability distributions over parameters of outcome models of corresponding complexity. Moreover, the approximation techniques introduced to mitigate this issue typically result in poor exploration-exploitation trade-offs, as observed in the case of deep neural network models with approximate posterior methods that have been shown to underperform in the deep bandit scenario. In this paper we introduce Sample Average Uncertainty (SAU), a simple and efficient uncertainty measure for contextual bandits. While Bayesian approaches like Thompson Sampling estimate outcomes uncertainty indirectly by first quantifying the variability over the parameters of the outcome model, SAU is a frequentist approach that directly estimates the uncertainty of the outcomes based on the value predictions. Importantly, we show theoretically that the uncertainty measure estimated by SAU asymptotically matches the uncertainty provided by Thompson Sampling, as well as its regret bounds. Because of its simplicity SAU can be seamlessly applied to deep contextual bandits as a very scalable drop-in replacement for epsilon-greedy exploration. Finally, we empirically confirm our theory by showing that SAU-based exploration outperforms current state-of-the-art deep Bayesian bandit methods on several real-world datasets at modest computation cost.

Related papers

Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [63.32053223422317]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning. In particular, we focus on characterizing the variance over values induced by a distribution over MDPs. We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
arXiv Detail & Related papers (2023-12-07T15:55:58Z)
Implicit Variational Inference for High-Dimensional Posteriors [7.924706533725115]
In variational inference, the benefits of Bayesian models rely on accurately capturing the true posterior distribution. We propose using neural samplers that specify implicit distributions, which are well-suited for approximating complex multimodal and correlated posteriors. Our approach introduces novel bounds for approximate inference using implicit distributions by locally linearising the neural sampler.
arXiv Detail & Related papers (2023-10-10T14:06:56Z)
Model-Based Uncertainty in Value Functions [89.31922008981735]
We focus on characterizing the variance over values induced by a distribution over MDPs. Previous work upper bounds the posterior variance over values by solving a so-called uncertainty Bellman equation. We propose a new uncertainty Bellman equation whose solution converges to the true posterior variance over values.
arXiv Detail & Related papers (2023-02-24T09:18:27Z)
Convergence of uncertainty estimates in Ensemble and Bayesian sparse model discovery [4.446017969073817]
We show empirical success in terms of accuracy and robustness to noise with bootstrapping-based sequential thresholding least-squares estimator. We show that this bootstrapping-based ensembling technique can perform a provably correct variable selection procedure with an exponential convergence rate of the error rate.
arXiv Detail & Related papers (2023-01-30T04:07:59Z)
On the Pitfalls of Heteroscedastic Uncertainty Estimation with Probabilistic Neural Networks [23.502721524477444]
We present a synthetic example illustrating how this approach can lead to very poor but stable estimates. We identify the culprit to be the log-likelihood loss, along with certain conditions that exacerbate the issue. We present an alternative formulation, termed $beta$-NLL, in which each data point's contribution to the loss is weighted by the $beta$-exponentiated variance estimate.
arXiv Detail & Related papers (2022-03-17T08:46:17Z)
Residual Overfit Method of Exploration [78.07532520582313]
We propose an approximate exploration methodology based on fitting only two point estimates, one tuned and one overfit. The approach drives exploration towards actions where the overfit model exhibits the most overfitting compared to the tuned model. We compare ROME against a set of established contextual bandit methods on three datasets and find it to be one of the best performing.
arXiv Detail & Related papers (2021-10-06T17:05:33Z)
A Unifying Theory of Thompson Sampling for Continuous Risk-Averse Bandits [91.3755431537592]
This paper unifies the analysis of risk-averse Thompson sampling algorithms for the multi-armed bandit problem. Using the contraction principle in the theory of large deviations, we prove novel concentration bounds for continuous risk functionals. We show that a wide class of risk functionals as well as "nice" functions of them satisfy the continuity condition.
arXiv Detail & Related papers (2021-08-25T17:09:01Z)
Calibration and Uncertainty Quantification of Bayesian Convolutional Neural Networks for Geophysical Applications [0.0]
It is common to incorporate the uncertainty of predictions such subsurface models should provide calibrated probabilities and the associated uncertainties in their predictions. It has been shown that popular Deep Learning-based models are often miscalibrated, and due to their deterministic nature, provide no means to interpret the uncertainty of their predictions. We compare three different approaches obtaining probabilistic models based on convolutional neural networks in a Bayesian formalism.
arXiv Detail & Related papers (2021-05-25T17:54:23Z)
Entropy-based adaptive design for contour finding and estimating reliability [0.24466725954625884]
In reliability analysis, methods used to estimate failure probability are often limited by the costs associated with model evaluations. We introduce an entropy-based GP adaptive design that, when paired with MFIS, provides more accurate failure probability estimates. Illustrative examples are provided on benchmark data as well as an application to an impact damage simulator for National Aeronautics and Space Administration (NASA) spacesuits.
arXiv Detail & Related papers (2021-05-24T15:41:15Z)
Temporal Difference Uncertainties as a Signal for Exploration [76.6341354269013]
An effective approach to exploration in reinforcement learning is to rely on an agent's uncertainty over the optimal policy. In this paper, we highlight that value estimates are easily biased and temporally inconsistent. We propose a novel method for estimating uncertainty over the value function that relies on inducing a distribution over temporal difference errors.
arXiv Detail & Related papers (2020-10-05T18:11:22Z)
Uncertainty Estimation Using a Single Deep Deterministic Neural Network [66.26231423824089]
We propose a method for training a deterministic deep model that can find and reject out of distribution data points at test time with a single forward pass. We scale training in these with a novel loss function and centroid updating scheme and match the accuracy of softmax models.
arXiv Detail & Related papers (2020-03-04T12:27:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.