Being a Bit Frequentist Improves Bayesian Neural Networks
- URL: http://arxiv.org/abs/2106.10065v1
- Date: Fri, 18 Jun 2021 11:22:42 GMT
- Title: Being a Bit Frequentist Improves Bayesian Neural Networks
- Authors: Agustinus Kristiadi and Matthias Hein and Philipp Hennig
- Abstract summary: We show that OOD-trained BNNs are competitive to, if not better than recent frequentist baselines.
This work provides strong baselines for future work in both Bayesian and frequentist UQ.
- Score: 76.73339435080446
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite their compelling theoretical properties, Bayesian neural networks
(BNNs) tend to perform worse than frequentist methods in classification-based
uncertainty quantification (UQ) tasks such as out-of-distribution (OOD)
detection and dataset-shift robustness. In this work, based on empirical
findings in prior works, we hypothesize that this issue is due to the avoidance
of Bayesian methods in the so-called "OOD training" -- a family of techniques
for incorporating OOD data during training process, which has since been an
integral part of state-of-the-art frequentist UQ methods. To validate this, we
treat OOD data as a first-class citizen in BNN training by exploring four
different ways of incorporating OOD data in Bayesian inference. We show in
extensive experiments that OOD-trained BNNs are competitive to, if not better
than recent frequentist baselines. This work thus provides strong baselines for
future work in both Bayesian and frequentist UQ.
Related papers
- Gradient-Regularized Out-of-Distribution Detection [28.542499196417214]
One of the challenges for neural networks in real-life applications is the overconfident errors these models make when the data is not from the original training distribution.
We propose the idea of leveraging the information embedded in the gradient of the loss function during training to enable the network to learn a desired OOD score for each sample.
We also develop a novel energy-based sampling method to allow the network to be exposed to more informative OOD samples during the training phase.
arXiv Detail & Related papers (2024-04-18T17:50:23Z) - Preventing Arbitrarily High Confidence on Far-Away Data in Point-Estimated Discriminative Neural Networks [28.97655735976179]
ReLU networks have been shown to almost always yield high confidence predictions when the test data are far away from the training set.
We overcome this problem by adding a term to the output of the neural network that corresponds to the logit of an extra class.
This technique provably prevents arbitrarily high confidence on far-away test data while maintaining a simple discriminative point-estimate training.
arXiv Detail & Related papers (2023-11-07T03:19:16Z) - Can Pre-trained Networks Detect Familiar Out-of-Distribution Data? [37.36999826208225]
We study the effect of PT-OOD on the OOD detection performance of pre-trained networks.
We find that the low linear separability of PT-OOD in the feature space heavily degrades the PT-OOD detection performance.
We propose a unique solution to large-scale pre-trained models: Leveraging powerful instance-by-instance discriminative representations of pre-trained models.
arXiv Detail & Related papers (2023-10-02T02:01:00Z) - Pseudo-OOD training for robust language models [78.15712542481859]
OOD detection is a key component of a reliable machine-learning model for any industry-scale application.
We propose POORE - POsthoc pseudo-Ood REgularization, that generates pseudo-OOD samples using in-distribution (IND) data.
We extensively evaluate our framework on three real-world dialogue systems, achieving new state-of-the-art in OOD detection.
arXiv Detail & Related papers (2022-10-17T14:32:02Z) - Out-of-Distribution Detection with Hilbert-Schmidt Independence
Optimization [114.43504951058796]
Outlier detection tasks have been playing a critical role in AI safety.
Deep neural network classifiers usually tend to incorrectly classify out-of-distribution (OOD) inputs into in-distribution classes with high confidence.
We propose an alternative probabilistic paradigm that is both practically useful and theoretically viable for the OOD detection tasks.
arXiv Detail & Related papers (2022-09-26T15:59:55Z) - On the Practicality of Deterministic Epistemic Uncertainty [106.06571981780591]
deterministic uncertainty methods (DUMs) achieve strong performance on detecting out-of-distribution data.
It remains unclear whether DUMs are well calibrated and can seamlessly scale to real-world applications.
arXiv Detail & Related papers (2021-07-01T17:59:07Z) - Provably Robust Detection of Out-of-distribution Data (almost) for free [124.14121487542613]
Deep neural networks are known to produce highly overconfident predictions on out-of-distribution (OOD) data.
In this paper we propose a novel method where from first principles we combine a certifiable OOD detector with a standard classifier into an OOD aware classifier.
In this way we achieve the best of two worlds: certifiably adversarially robust OOD detection, even for OOD samples close to the in-distribution, without loss in prediction accuracy and close to state-of-the-art OOD detection performance for non-manipulated OOD data.
arXiv Detail & Related papers (2021-06-08T11:40:49Z) - Statistical Testing for Efficient Out of Distribution Detection in Deep
Neural Networks [26.0303701309125]
This paper frames the Out Of Distribution (OOD) detection problem in Deep Neural Networks as a statistical hypothesis testing problem.
We build on this framework to suggest a novel OOD procedure based on low-order statistics.
Our method achieves comparable or better than state-of-the-art results on well-accepted OOD benchmarks without retraining the network parameters.
arXiv Detail & Related papers (2021-02-25T16:14:47Z) - Probing Predictions on OOD Images via Nearest Categories [97.055916832257]
We study out-of-distribution (OOD) prediction behavior of neural networks when they classify images from unseen classes or corrupted images.
We introduce a new measure, nearest category generalization (NCG), where we compute the fraction of OOD inputs that are classified with the same label as their nearest neighbor in the training set.
We find that robust networks have consistently higher NCG accuracy than natural training, even when the OOD data is much farther away than the robustness radius.
arXiv Detail & Related papers (2020-11-17T07:42:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.