Masksembles for Uncertainty Estimation
- URL: http://arxiv.org/abs/2012.08334v1
- Date: Tue, 15 Dec 2020 14:39:57 GMT
- Title: Masksembles for Uncertainty Estimation
- Authors: Nikita Durasov, Timur Bagautdinov, Pierre Baque, Pascal Fua
- Abstract summary: Deep neural networks have amply demonstrated their prowess but estimating the reliability of their predictions remains challenging.
Deep Ensembles are widely considered as being one of the best methods for generating uncertainty estimates but are very expensive to train and evaluate.
MC-Dropout is another popular alternative, which is less expensive, but also less reliable.
- Score: 60.400102501013784
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep neural networks have amply demonstrated their prowess but estimating the
reliability of their predictions remains challenging. Deep Ensembles are widely
considered as being one of the best methods for generating uncertainty
estimates but are very expensive to train and evaluate. MC-Dropout is another
popular alternative, which is less expensive, but also less reliable. Our
central intuition is that there is a continuous spectrum of ensemble-like
models of which MC-Dropout and Deep Ensembles are extreme examples. The first
uses an effectively infinite number of highly correlated models while the
second relies on a finite number of independent models.
To combine the benefits of both, we introduce Masksembles. Instead of
randomly dropping parts of the network as in MC-dropout, Masksemble relies on a
fixed number of binary masks, which are parameterized in a way that allows to
change correlations between individual models. Namely, by controlling the
overlap between the masks and their density one can choose the optimal
configuration for the task at hand. This leads to a simple and easy to
implement method with performance on par with Ensembles at a fraction of the
cost. We experimentally validate Masksembles on two widely used datasets,
CIFAR10 and ImageNet.
Related papers
- Pluralistic Salient Object Detection [108.74650817891984]
We introduce pluralistic salient object detection (PSOD), a novel task aimed at generating multiple plausible salient segmentation results for a given input image.
We present two new SOD datasets "DUTS-MM" and "DUS-MQ", along with newly designed evaluation metrics.
arXiv Detail & Related papers (2024-09-04T01:38:37Z) - Breaking through Deterministic Barriers: Randomized Pruning Mask
Generation and Selection [29.375460634415806]
We train a large model and then remove its redundant neurons or weights by pruning.
This approach achieves state-of-the-art performance across eight datasets from GLUE.
arXiv Detail & Related papers (2023-10-19T22:32:51Z) - Faithfulness Measurable Masked Language Models [35.40666730867487]
A common approach to explaining NLP models is to use importance measures that express which tokens are important for a prediction.
One such metric is if tokens are truly important, then masking them should result in worse model performance.
This work proposes an inherently faithfulness measurable model that addresses these challenges.
arXiv Detail & Related papers (2023-10-11T19:00:40Z) - Effective Neural Network $L_0$ Regularization With BinMask [15.639601066641099]
We show that a straightforward formulation, BinMask, is an effective $L_0$ regularizer.
We evaluate BinMask on three tasks: feature selection, network sparsification, and model regularization.
arXiv Detail & Related papers (2023-04-21T20:08:57Z) - GFlowOut: Dropout with Generative Flow Networks [76.59535235717631]
Monte Carlo Dropout has been widely used as a relatively cheap way for approximate Inference.
Recent works show that the dropout mask can be viewed as a latent variable, which can be inferred with variational inference.
GFlowOutleverages the recently proposed probabilistic framework of Generative Flow Networks (GFlowNets) to learn the posterior distribution over dropout masks.
arXiv Detail & Related papers (2022-10-24T03:00:01Z) - Parameter-Efficient Masking Networks [61.43995077575439]
Advanced network designs often contain a large number of repetitive structures (e.g., Transformer)
In this study, we are the first to investigate the representative potential of fixed random weights with limited unique values by learning masks.
It leads to a new paradigm for model compression to diminish the model size.
arXiv Detail & Related papers (2022-10-13T03:39:03Z) - Sparse MoEs meet Efficient Ensembles [49.313497379189315]
We study the interplay of two popular classes of such models: ensembles of neural networks and sparse mixture of experts (sparse MoEs)
We present Efficient Ensemble of Experts (E$3$), a scalable and simple ensemble of sparse MoEs that takes the best of both classes of models, while using up to 45% fewer FLOPs than a deep ensemble.
arXiv Detail & Related papers (2021-10-07T11:58:35Z) - Improving Self-supervised Pre-training via a Fully-Explored Masked
Language Model [57.77981008219654]
Masked Language Model (MLM) framework has been widely adopted for self-supervised language pre-training.
We propose a fully-explored masking strategy, where a text sequence is divided into a certain number of non-overlapping segments.
arXiv Detail & Related papers (2020-10-12T21:28:14Z) - Why have a Unified Predictive Uncertainty? Disentangling it using Deep
Split Ensembles [39.29536042476913]
Understanding and quantifying uncertainty in black box Neural Networks (NNs) is critical when deployed in real-world settings such as healthcare.
We propose a conceptually simple non-Bayesian approach, deep split ensemble, to disentangle the predictive uncertainties.
arXiv Detail & Related papers (2020-09-25T19:15:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.