Layer Ensembles
- URL: http://arxiv.org/abs/2210.04882v3
- Date: Fri, 7 Jul 2023 09:46:39 GMT
- Title: Layer Ensembles
- Authors: Illia Oleksiienko and Alexandros Iosifidis
- Abstract summary: We introduce a method for uncertainty estimation that considers a set of independent categorical distributions for each layer of the network.
We show that the method can be further improved by ranking samples, resulting in models that require less memory and time to run.
- Score: 95.42181254494287
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep Ensembles, as a type of Bayesian Neural Networks, can be used to
estimate uncertainty on the prediction of multiple neural networks by
collecting votes from each network and computing the difference in those
predictions. In this paper, we introduce a method for uncertainty estimation
that considers a set of independent categorical distributions for each layer of
the network, giving many more possible samples with overlapped layers than in
the regular Deep Ensembles. We further introduce an optimized inference
procedure that reuses common layer outputs, achieving up to 19x speed up and
reducing memory usage quadratically. We also show that the method can be
further improved by ranking samples, resulting in models that require less
memory and time to run while achieving higher uncertainty quality than Deep
Ensembles.
Related papers
- Optimization Guarantees of Unfolded ISTA and ADMM Networks With Smooth
Soft-Thresholding [57.71603937699949]
We study optimization guarantees, i.e., achieving near-zero training loss with the increase in the number of learning epochs.
We show that the threshold on the number of training samples increases with the increase in the network width.
arXiv Detail & Related papers (2023-09-12T13:03:47Z) - Likelihood-Free Inference with Generative Neural Networks via Scoring
Rule Minimization [0.0]
Inference methods yield posterior approximations for simulator models with intractable likelihood.
Many works trained neural networks to approximate either the intractable likelihood or the posterior directly.
Here, we propose to approximate the posterior with generative networks trained by Scoring Rule minimization.
arXiv Detail & Related papers (2022-05-31T13:32:55Z) - Arbitrary Bit-width Network: A Joint Layer-Wise Quantization and
Adaptive Inference Approach [38.03309300383544]
We propose to feed different data samples with varying quantization schemes to achieve a data-dependent dynamic inference, at a fine-grained layer level.
We present the Arbitrary Bit-width Network (ABN), where the bit-widths of a single deep network can change at runtime for different data samples, with a layer-wise granularity.
On ImageNet classification, we achieve 1.1% top1 accuracy improvement while saving 36.2% BitOps.
arXiv Detail & Related papers (2022-04-21T09:36:43Z) - Mixtures of Laplace Approximations for Improved Post-Hoc Uncertainty in
Deep Learning [24.3370326359959]
We propose to predict with a Gaussian mixture model posterior that consists of a weighted sum of Laplace approximations of independently trained deep neural networks.
We theoretically validate that our approach mitigates overconfidence "far away" from the training data and empirically compare against state-of-the-art baselines on standard uncertainty quantification benchmarks.
arXiv Detail & Related papers (2021-11-05T15:52:48Z) - Cluster-Promoting Quantization with Bit-Drop for Minimizing Network
Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks.
DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons.
We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z) - Sparse Uncertainty Representation in Deep Learning with Inducing Weights [22.912675044223302]
We extend Matheron's conditional Gaussian sampling rule to enable fast weight sampling, which enables our inference method to maintain reasonable run-time as compared with ensembles.
Our approach achieves competitive performance to the state-of-the-art in prediction and uncertainty estimation tasks with fully connected neural networks and ResNets.
arXiv Detail & Related papers (2021-05-30T18:17:47Z) - ESPN: Extremely Sparse Pruned Networks [50.436905934791035]
We show that a simple iterative mask discovery method can achieve state-of-the-art compression of very deep networks.
Our algorithm represents a hybrid approach between single shot network pruning methods and Lottery-Ticket type approaches.
arXiv Detail & Related papers (2020-06-28T23:09:27Z) - Fitting the Search Space of Weight-sharing NAS with Graph Convolutional
Networks [100.14670789581811]
We train a graph convolutional network to fit the performance of sampled sub-networks.
With this strategy, we achieve a higher rank correlation coefficient in the selected set of candidates.
arXiv Detail & Related papers (2020-04-17T19:12:39Z) - Anytime Inference with Distilled Hierarchical Neural Ensembles [32.003196185519]
Inference in deep neural networks can be computationally expensive, and networks capable of anytime inference are important in mscenarios where the amount of compute or quantity of input data varies over time.
We propose Hierarchical Neural Ensembles (HNE), a novel framework to embed an ensemble of multiple networks in a hierarchical tree structure, sharing intermediate layers.
Our experiments show that, compared to previous anytime inference models, HNE provides state-of-the-art accuracy-computate trade-offs on the CIFAR-10/100 and ImageNet datasets.
arXiv Detail & Related papers (2020-03-03T12:13:38Z) - MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks.
The use of gradient combined nonvolutionity renders learning susceptible to novel problems.
We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.