Related papers: Scalable Bayesian Learning with posteriors

Scalable Bayesian Learning with posteriors

URL: http://arxiv.org/abs/2406.00104v1
Date: Fri, 31 May 2024 18:00:12 GMT
Title: Scalable Bayesian Learning with posteriors
Authors: Samuel Duffield, Kaelan Donatella, Johnathan Chiu, Phoebe Klett, Daniel Simpson,
Abstract summary: We introduce posteriors, an easily PyTorch library hosting general-purpose implementations of Bayesian learning. We demonstrate and compare the utility of Bayesian approximations through experiments including an investigation into the cold posterior effect and applications with large language models.
Score: 0.856335408411906
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Although theoretically compelling, Bayesian learning with modern machine learning models is computationally challenging since it requires approximating a high dimensional posterior distribution. In this work, we (i) introduce posteriors, an easily extensible PyTorch library hosting general-purpose implementations making Bayesian learning accessible and scalable to large data and parameter regimes; (ii) present a tempered framing of stochastic gradient Markov chain Monte Carlo, as implemented in posteriors, that transitions seamlessly into optimization and unveils a minor modification to deep ensembles to ensure they are asymptotically unbiased for the Bayesian posterior, and (iii) demonstrate and compare the utility of Bayesian approximations through experiments including an investigation into the cold posterior effect and applications with large language models.

Related papers

Generalized Bayesian deep reinforcement learning [2.469908534801392]
We propose to model the dynamics of the unknown environment through deep generative models assuming Markov dependence. In absence of likelihood functions for these models we train them by learning a generalized predictive-sequential (or prequential) scoring rule (SR) posterior. For policy learning, we propose expected Thompson sampling (ETS) to learn the optimal policy by maximizing the expected value function with respect to the posterior distribution.
arXiv Detail & Related papers (2024-12-16T13:02:17Z)
von Mises Quasi-Processes for Bayesian Circular Regression [57.88921637944379]
We explore a family of expressive and interpretable distributions over circle-valued random functions. The resulting probability model has connections with continuous spin models in statistical physics. For posterior inference, we introduce a new Stratonovich-like augmentation that lends itself to fast Markov Chain Monte Carlo sampling.
arXiv Detail & Related papers (2024-06-19T01:57:21Z)
A variational neural Bayes framework for inference on intractable posterior distributions [1.0801976288811024]
Posterior distributions of model parameters are efficiently obtained by feeding observed data into a trained neural network. We show theoretically that our posteriors converge to the true posteriors in Kullback-Leibler divergence.
arXiv Detail & Related papers (2024-04-16T20:40:15Z)
Diffusion posterior sampling for simulation-based inference in tall data settings [53.17563688225137]
Simulation-based inference ( SBI) is capable of approximating the posterior distribution that relates input parameters to a given observation. In this work, we consider a tall data extension in which multiple observations are available to better infer the parameters of the model. We compare our method to recently proposed competing approaches on various numerical experiments and demonstrate its superiority in terms of numerical stability and computational cost.
arXiv Detail & Related papers (2024-04-11T09:23:36Z)
Towards Improved Variational Inference for Deep Bayesian Models [7.841254447222393]
In this thesis, we explore the use of variational inference (VI) as an approximation. VI is unique in simultaneously approximating the posterior and providing a lower bound to the marginal likelihood. We propose a variational posterior that provides a unified view of inference in Bayesian neural networks and deep Gaussian processes.
arXiv Detail & Related papers (2024-01-23T00:40:20Z)
Machine Learning and the Future of Bayesian Computation [15.863162558281614]
We discuss the potential to improve posterior computation using ideas from machine learning. Concrete future directions are explored in vignettes on normalizing flows, Bayesian coresets, distributed Bayesian inference, and variational inference.
arXiv Detail & Related papers (2023-04-21T21:03:01Z)
Quasi Black-Box Variational Inference with Natural Gradients for Bayesian Learning [84.90242084523565]
We develop an optimization algorithm suitable for Bayesian learning in complex models. Our approach relies on natural gradient updates within a general black-box framework for efficient training with limited model-specific derivations.
arXiv Detail & Related papers (2022-05-23T18:54:27Z)
Non-Volatile Memory Accelerated Posterior Estimation [3.4256231429537936]
Current machine learning models use only a single learnable parameter combination when making predictions. We show that through the use of high-capacity persistent storage, models whose posterior distribution was too big to approximate are now feasible.
arXiv Detail & Related papers (2022-02-21T20:25:57Z)
What Are Bayesian Neural Network Posteriors Really Like? [63.950151520585024]
We show that Hamiltonian Monte Carlo can achieve significant performance gains over standard and deep ensembles. We also show that deep distributions are similarly close to HMC as standard SGLD, and closer than standard variational inference.
arXiv Detail & Related papers (2021-04-29T15:38:46Z)
Multilevel Gibbs Sampling for Bayesian Regression [6.2997667081978825]
The level hierarchy of data matrices is created by clustering the features and/or samples of data matrices. The use of correlated samples is investigated for variance reduction to improve the convergence of the Markov Chain. Speed-up is achieved for almost all of them without significant loss in predictive performance.
arXiv Detail & Related papers (2020-09-25T11:18:17Z)
Efficiently Sampling Functions from Gaussian Process Posteriors [76.94808614373609]
We propose an easy-to-use and general-purpose approach for fast posterior sampling. We demonstrate how decoupled sample paths accurately represent Gaussian process posteriors at a fraction of the usual cost.
arXiv Detail & Related papers (2020-02-21T14:03:16Z)
Bayesian Deep Learning and a Probabilistic Perspective of Generalization [56.69671152009899]
We show that deep ensembles provide an effective mechanism for approximate Bayesian marginalization. We also propose a related approach that further improves the predictive distribution by marginalizing within basins of attraction.
arXiv Detail & Related papers (2020-02-20T15:13:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.