Transformers Can Do Bayesian Inference
- URL: http://arxiv.org/abs/2112.10510v1
- Date: Mon, 20 Dec 2021 13:07:39 GMT
- Title: Transformers Can Do Bayesian Inference
- Authors: Samuel M\"uller, Noah Hollmann, Sebastian Pineda Arango, Josif
Grabocka and Frank Hutter
- Abstract summary: We present Prior-Data Fitted Networks (PFNs)
PFNs leverage large-scale machine learning techniques to approximate a large set of posteriors.
We demonstrate that PFNs can near-perfectly mimic Gaussian processes.
- Score: 28.936428431504165
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Currently, it is hard to reap the benefits of deep learning for Bayesian
methods, which allow the explicit specification of prior knowledge and
accurately capture model uncertainty. We present Prior-Data Fitted Networks
(PFNs). PFNs leverage large-scale machine learning techniques to approximate a
large set of posteriors. The only requirement for PFNs to work is the ability
to sample from a prior distribution over supervised learning tasks (or
functions). Our method restates the objective of posterior approximation as a
supervised classification problem with a set-valued input: it repeatedly draws
a task (or function) from the prior, draws a set of data points and their
labels from it, masks one of the labels and learns to make probabilistic
predictions for it based on the set-valued input of the rest of the data
points. Presented with a set of samples from a new supervised learning task as
input, PFNs make probabilistic predictions for arbitrary other data points in a
single forward propagation, having learned to approximate Bayesian inference.
We demonstrate that PFNs can near-perfectly mimic Gaussian processes and also
enable efficient Bayesian inference for intractable problems, with over
200-fold speedups in multiple setups compared to current methods. We obtain
strong results in very diverse areas such as Gaussian process regression,
Bayesian neural networks, classification for small tabular data sets, and
few-shot image classification, demonstrating the generality of PFNs. Code and
trained PFNs are released at
https://github.com/automl/TransformersCanDoBayesianInference.
Related papers
- Flexible Heteroscedastic Count Regression with Deep Double Poisson Networks [4.58556584533865]
We train a neural network to output the parameters of a Double Poisson distribution.
We show DDPNs vastly outperform existing discrete models.
DDPNs can easily be applied to a variety of count regression datasets.
arXiv Detail & Related papers (2024-06-13T16:02:03Z) - Probabilistic Contrastive Learning for Long-Tailed Visual Recognition [78.70453964041718]
Longtailed distributions frequently emerge in real-world data, where a large number of minority categories contain a limited number of samples.
Recent investigations have revealed that supervised contrastive learning exhibits promising potential in alleviating the data imbalance.
We propose a novel probabilistic contrastive (ProCo) learning algorithm that estimates the data distribution of the samples from each class in the feature space.
arXiv Detail & Related papers (2024-03-11T13:44:49Z) - Domain Adaptive Synapse Detection with Weak Point Annotations [63.97144211520869]
We present AdaSyn, a framework for domain adaptive synapse detection with weak point annotations.
In the WASPSYN challenge at I SBI 2023, our method ranks the 1st place.
arXiv Detail & Related papers (2023-08-31T05:05:53Z) - Statistical Foundations of Prior-Data Fitted Networks [0.7614628596146599]
Prior-data fitted networks (PFNs) were recently proposed as a new paradigm for machine learning.
This article establishes a theoretical foundation for PFNs and illuminates the statistical mechanisms governing their behavior.
arXiv Detail & Related papers (2023-05-18T16:34:21Z) - Improved uncertainty quantification for neural networks with Bayesian
last layer [0.0]
Uncertainty quantification is an important task in machine learning.
We present a reformulation of the log-marginal likelihood of a NN with BLL which allows for efficient training using backpropagation.
arXiv Detail & Related papers (2023-02-21T20:23:56Z) - Generalized Differentiable RANSAC [95.95627475224231]
$nabla$-RANSAC is a differentiable RANSAC that allows learning the entire randomized robust estimation pipeline.
$nabla$-RANSAC is superior to the state-of-the-art in terms of accuracy while running at a similar speed to its less accurate alternatives.
arXiv Detail & Related papers (2022-12-26T15:13:13Z) - An unfolding method based on conditional Invertible Neural Networks
(cINN) using iterative training [0.0]
Generative networks like invertible neural networks(INN) enable a probabilistic unfolding.
We introduce the iterative conditional INN(IcINN) for unfolding that adjusts for deviations between simulated training samples and data.
arXiv Detail & Related papers (2022-12-16T19:00:05Z) - GFlowOut: Dropout with Generative Flow Networks [76.59535235717631]
Monte Carlo Dropout has been widely used as a relatively cheap way for approximate Inference.
Recent works show that the dropout mask can be viewed as a latent variable, which can be inferred with variational inference.
GFlowOutleverages the recently proposed probabilistic framework of Generative Flow Networks (GFlowNets) to learn the posterior distribution over dropout masks.
arXiv Detail & Related papers (2022-10-24T03:00:01Z) - Exploring the Uncertainty Properties of Neural Networks' Implicit Priors
in the Infinite-Width Limit [47.324627920761685]
We use recent theoretical advances that characterize the function-space prior to an ensemble of infinitely-wide NNs as a Gaussian process.
This gives us a better understanding of the implicit prior NNs place on function space.
We also examine the calibration of previous approaches to classification with the NNGP.
arXiv Detail & Related papers (2020-10-14T18:41:54Z) - Pre-training Is (Almost) All You Need: An Application to Commonsense
Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks.
We introduce a new scoring method that casts a plausibility ranking task in a full-text format.
We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.