Related papers: Efficient Online Bayesian Inference for Neural Bandits

Efficient Online Bayesian Inference for Neural Bandits

URL: http://arxiv.org/abs/2112.00195v1
Date: Wed, 1 Dec 2021 00:29:51 GMT
Title: Efficient Online Bayesian Inference for Neural Bandits
Authors: Gerardo Duran-Martin and Aleyna Kara and Kevin Murphy
Abstract summary: We present a new algorithm for online (sequential) inference in Bayesian neural networks. The key idea is to combine the extended Kalman filter with a subspace for the parameters. We show good results on the "Deep Bayesian Bandit Showdown" benchmark, as well as MNIST and a recommender system.
Score: 10.353171848879187
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper we present a new algorithm for online (sequential) inference in Bayesian neural networks, and show its suitability for tackling contextual bandit problems. The key idea is to combine the extended Kalman filter (which locally linearizes the likelihood function at each time step) with a (learned or random) low-dimensional affine subspace for the parameters; the use of a subspace enables us to scale our algorithm to models with $\sim 1M$ parameters. While most other neural bandit methods need to store the entire past dataset in order to avoid the problem of "catastrophic forgetting", our approach uses constant memory. This is possible because we represent uncertainty about all the parameters in the model, not just the final linear layer. We show good results on the "Deep Bayesian Bandit Showdown" benchmark, as well as MNIST and a recommender system.

Related papers

Exploration via Feature Perturbation in Contextual Bandits [33.46701416812218]
We propose a simple strategy for contextual bandits that injects randomness directly into feature inputs.<n>Remarkably, this algorithm achieves $tildemathcalO(dsqrtT)$ worst-case regret bound for generalized linear contextual bandits.
arXiv Detail & Related papers (2025-10-20T10:32:48Z)
Sparse Linear Bandits with Blocking Constraints [22.01704171400845]
We investigate the high-dimensional sparse linear bandits problem in a data-poor regime.<n>We show novel offline statistical guarantees of the lasso estimator for the linear model.<n>We propose a meta-algorithm based on corralling that does not need knowledge of optimal sparsity parameter $k$ at minimal cost to regret.
arXiv Detail & Related papers (2024-10-26T01:42:03Z)
Leveraging Offline Data in Linear Latent Contextual Bandits [27.272915631913175]
We design end-to-end latent bandit algorithms capable of handing uncountably many latent states.<n>We present two online algorithms that utilize the output of this offline algorithm to accelerate online learning.
arXiv Detail & Related papers (2024-05-27T16:23:34Z)
Stochastic Bandits with ReLU Neural Networks [40.41457480347015]
We show that a $tildeO(sqrtT)$ regret guarantee is achievable by considering bandits with one-layer ReLU neural networks. We propose an OFU-ReLU algorithm that can achieve this upper bound.
arXiv Detail & Related papers (2024-05-12T16:54:57Z)
Minimax Optimal Quantization of Linear Models: Information-Theoretic Limits and Efficient Algorithms [59.724977092582535]
We consider the problem of quantizing a linear model learned from measurements. We derive an information-theoretic lower bound for the minimax risk under this setting. We show that our method and upper-bounds can be extended for two-layer ReLU neural networks.
arXiv Detail & Related papers (2022-02-23T02:39:04Z)
Learning Contextual Bandits Through Perturbed Rewards [107.6210145983805]
We show that a $tildeO(tildedsqrtT)$ regret upper bound is still achievable under standard regularity conditions. We perturb the rewards when updating the neural network to eliminate the need of explicit exploration.
arXiv Detail & Related papers (2022-01-24T19:10:22Z)
Optimal Stopping via Randomized Neural Networks [6.677219861416146]
This paper presents the benefits of using randomized neural networks instead of standard basis functions or deep neural networks. Our approaches are applicable to high dimensional problems where the existing approaches become increasingly impractical. In all cases, our algorithms outperform the state-of-the-art and other relevant machine learning approaches in terms of time.
arXiv Detail & Related papers (2021-04-28T09:47:21Z)
Online Limited Memory Neural-Linear Bandits with Likelihood Matching [53.18698496031658]
We study neural-linear bandits for solving problems where both exploration and representation learning play an important role. We propose a likelihood matching algorithm that is resilient to catastrophic forgetting and is completely online.
arXiv Detail & Related papers (2021-02-07T14:19:07Z)
Neural Contextual Bandits with Deep Representation and Shallow Exploration [105.8099566651448]
We propose a novel learning algorithm that transforms the raw feature vector using the last hidden layer of a deep ReLU neural network. Compared with existing neural contextual bandit algorithms, our approach is computationally much more efficient since it only needs to explore in the last layer of the deep neural network.
arXiv Detail & Related papers (2020-12-03T09:17:55Z)
Neural Thompson Sampling [94.82847209157494]
We propose a new algorithm, called Neural Thompson Sampling, which adapts deep neural networks for both exploration and exploitation. At the core of our algorithm is a novel posterior distribution of the reward, where its mean is the neural network approximator, and its variance is built upon the neural tangent features of the corresponding neural network.
arXiv Detail & Related papers (2020-10-02T07:44:09Z)
Nearly Dimension-Independent Sparse Linear Bandit over Small Action Spaces via Best Subset Selection [71.9765117768556]
We consider the contextual bandit problem under the high dimensional linear model. This setting finds essential applications such as personalized recommendation, online advertisement, and personalized medicine. We propose doubly growing epochs and estimating the parameter using the best subset selection method.
arXiv Detail & Related papers (2020-09-04T04:10:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.