Streamlining Prediction in Bayesian Deep Learning
- URL: http://arxiv.org/abs/2411.18425v1
- Date: Wed, 27 Nov 2024 15:07:44 GMT
- Title: Streamlining Prediction in Bayesian Deep Learning
- Authors: Rui Li, Marcus Klasson, Arno Solin, Martin Trapp,
- Abstract summary: In this work we examine streamlining prediction in BDL through a single forward pass without sampling.
We analytically compute an approximation to the posterior predictive distribution.
We showcase our approach for both transformers, such as ViT and GPT-2, and assess its performance on regression and classification tasks.
- Score: 16.061370232443988
- License:
- Abstract: The rising interest in Bayesian deep learning (BDL) has led to a plethora of methods for estimating the posterior distribution. However, efficient computation of inferences, such as predictions, has been largely overlooked with Monte Carlo integration remaining the standard. In this work we examine streamlining prediction in BDL through a single forward pass without sampling. For this we use local linearisation on activation functions and local Gaussian approximations at linear layers. Thus allowing us to analytically compute an approximation to the posterior predictive distribution. We showcase our approach for both MLP and transformers, such as ViT and GPT-2, and assess its performance on regression and classification tasks.
Related papers
- Generalized Bayesian deep reinforcement learning [2.469908534801392]
We propose to model the dynamics of the unknown environment through deep generative models assuming Markov dependence.
In absence of likelihood functions for these models we train them by learning a generalized predictive-sequential (or prequential) scoring rule (SR) posterior.
For policy learning, we propose expected Thompson sampling (ETS) to learn the optimal policy by maximizing the expected value function with respect to the posterior distribution.
arXiv Detail & Related papers (2024-12-16T13:02:17Z) - Hessian-Free Laplace in Bayesian Deep Learning [44.16006844888796]
Hessian-free Laplace (HFL) approximation uses curvature of both the log posterior and network prediction to estimate its variance.
We show that, under standard assumptions of LA in Bayesian deep learning, HFL targets the same variance as LA, and can be efficiently amortized in a pre-trained network.
arXiv Detail & Related papers (2024-03-15T20:47:39Z) - Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo [104.9535542833054]
We present a scalable and effective exploration strategy based on Thompson sampling for reinforcement learning (RL)
We instead directly sample the Q function from its posterior distribution, by using Langevin Monte Carlo.
Our approach achieves better or similar results compared with state-of-the-art deep RL algorithms on several challenging exploration tasks from the Atari57 suite.
arXiv Detail & Related papers (2023-05-29T17:11:28Z) - Distributional Reinforcement Learning with Dual Expectile-Quantile Regression [51.87411935256015]
quantile regression approach to distributional RL provides flexible and effective way of learning arbitrary return distributions.
We show that distributional guarantees vanish, and we empirically observe that the estimated distribution rapidly collapses to its mean estimation.
Motivated by the efficiency of $L$-based learning, we propose to jointly learn expectiles and quantiles of the return distribution in a way that allows efficient learning while keeping an estimate of the full distribution of returns.
arXiv Detail & Related papers (2023-05-26T12:30:05Z) - Langevin Monte Carlo for Contextual Bandits [72.00524614312002]
Langevin Monte Carlo Thompson Sampling (LMC-TS) is proposed to directly sample from the posterior distribution in contextual bandits.
We prove that the proposed algorithm achieves the same sublinear regret bound as the best Thompson sampling algorithms for a special case of contextual bandits.
arXiv Detail & Related papers (2022-06-22T17:58:23Z) - Transformers Can Do Bayesian Inference [56.99390658880008]
We present Prior-Data Fitted Networks (PFNs)
PFNs leverage in-context learning in large-scale machine learning techniques to approximate a large set of posteriors.
We demonstrate that PFNs can near-perfectly mimic Gaussian processes and also enable efficient Bayesian inference for intractable problems.
arXiv Detail & Related papers (2021-12-20T13:07:39Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - The Bayesian Method of Tensor Networks [1.7894377200944511]
We study the Bayesian framework of the Network from two perspective.
We study the Bayesian properties of the Network by visualizing the parameters of the model and the decision boundaries in the two dimensional synthetic data set.
arXiv Detail & Related papers (2021-01-01T14:59:15Z) - Understanding Variational Inference in Function-Space [20.940162027560408]
We highlight some advantages and limitations of employing the Kullback-Leibler divergence in this setting.
We propose (featurized) Bayesian linear regression as a benchmark for function-space' inference methods that directly measures approximation quality.
arXiv Detail & Related papers (2020-11-18T17:42:01Z) - Improving predictions of Bayesian neural nets via local linearization [79.21517734364093]
We argue that the Gauss-Newton approximation should be understood as a local linearization of the underlying Bayesian neural network (BNN)
Because we use this linearized model for posterior inference, we should also predict using this modified model instead of the original one.
We refer to this modified predictive as "GLM predictive" and show that it effectively resolves common underfitting problems of the Laplace approximation.
arXiv Detail & Related papers (2020-08-19T12:35:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.