Training independent subnetworks for robust prediction
- URL: http://arxiv.org/abs/2010.06610v2
- Date: Wed, 4 Aug 2021 23:30:03 GMT
- Title: Training independent subnetworks for robust prediction
- Authors: Marton Havasi, Rodolphe Jenatton, Stanislav Fort, Jeremiah Zhe Liu,
Jasper Snoek, Balaji Lakshminarayanan, Andrew M. Dai, Dustin Tran
- Abstract summary: We show that the benefits of using multiple predictions can be achieved for free' under a single model's forward pass.
We observe a significant improvement in negative log-likelihood, accuracy, and calibration error on CIFAR10, CIFAR100, ImageNet, and their out-of-distribution variants.
- Score: 47.81111607870936
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent approaches to efficiently ensemble neural networks have shown that
strong robustness and uncertainty performance can be achieved with a negligible
gain in parameters over the original network. However, these methods still
require multiple forward passes for prediction, leading to a significant
computational cost. In this work, we show a surprising result: the benefits of
using multiple predictions can be achieved `for free' under a single model's
forward pass. In particular, we show that, using a multi-input multi-output
(MIMO) configuration, one can utilize a single model's capacity to train
multiple subnetworks that independently learn the task at hand. By ensembling
the predictions made by the subnetworks, we improve model robustness without
increasing compute. We observe a significant improvement in negative
log-likelihood, accuracy, and calibration error on CIFAR10, CIFAR100, ImageNet,
and their out-of-distribution variants compared to previous methods.
Related papers
- Faster Language Models with Better Multi-Token Prediction Using Tensor Decomposition [5.575078692353885]
We propose a new model for multi-token prediction in transformers, aiming to enhance sampling efficiency without compromising accuracy.
By generalizing it to a rank-$r$ canonical probability decomposition, we develop an improved model that predicts multiple tokens simultaneously.
arXiv Detail & Related papers (2024-10-23T11:06:36Z) - Structured Radial Basis Function Network: Modelling Diversity for
Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions.
A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems.
It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z) - Probabilistic MIMO U-Net: Efficient and Accurate Uncertainty Estimation
for Pixel-wise Regression [1.4528189330418977]
Uncertainty estimation in machine learning is paramount for enhancing the reliability and interpretability of predictive models.
We present an adaptation of the Multiple-Input Multiple-Output (MIMO) framework for pixel-wise regression tasks.
arXiv Detail & Related papers (2023-08-14T22:08:28Z) - Censored Quantile Regression Neural Networks [24.118509578363593]
This paper considers doing quantile regression on censored data using neural networks (NNs)
We show how an algorithm popular in linear models can be applied to NNs.
Our major contribution is a novel algorithm that simultaneously optimises a grid of quantiles output by a single NN.
arXiv Detail & Related papers (2022-05-26T17:10:28Z) - NUQ: Nonparametric Uncertainty Quantification for Deterministic Neural
Networks [151.03112356092575]
We show the principled way to measure the uncertainty of predictions for a classifier based on Nadaraya-Watson's nonparametric estimate of the conditional label distribution.
We demonstrate the strong performance of the method in uncertainty estimation tasks on a variety of real-world image datasets.
arXiv Detail & Related papers (2022-02-07T12:30:45Z) - End-to-End Weak Supervision [15.125993628007972]
We propose an end-to-end approach for directly learning the downstream model.
We show improved performance over prior work in terms of end model performance on downstream test sets.
arXiv Detail & Related papers (2021-07-05T19:10:11Z) - Ex uno plures: Splitting One Model into an Ensemble of Subnetworks [18.814965334083425]
We propose a strategy to compute an ensemble ofworks, each corresponding to a non-overlapping dropout mask computed via a pruning strategy and trained independently.
We show that the proposed subnetwork ensembling method can perform as well as standard deep ensembles in both accuracy and uncertainty estimates.
We experimentally demonstrate that subnetwork ensembling also consistently outperforms recently proposed approaches that efficiently ensemble neural networks.
arXiv Detail & Related papers (2021-06-09T01:49:49Z) - Sampling-free Variational Inference for Neural Networks with
Multiplicative Activation Noise [51.080620762639434]
We propose a more efficient parameterization of the posterior approximation for sampling-free variational inference.
Our approach yields competitive results for standard regression problems and scales well to large-scale image classification tasks.
arXiv Detail & Related papers (2021-03-15T16:16:18Z) - Ambiguity in Sequential Data: Predicting Uncertain Futures with
Recurrent Models [110.82452096672182]
We propose an extension of the Multiple Hypothesis Prediction (MHP) model to handle ambiguous predictions with sequential data.
We also introduce a novel metric for ambiguous problems, which is better suited to account for uncertainties.
arXiv Detail & Related papers (2020-03-10T09:15:42Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.