Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors
- URL: http://arxiv.org/abs/2005.07186v2
- Date: Fri, 14 Aug 2020 20:39:11 GMT
- Title: Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors
- Authors: Michael W. Dusenberry, Ghassen Jerfel, Yeming Wen, Yi-An Ma, Jasper
Snoek, Katherine Heller, Balaji Lakshminarayanan, Dustin Tran
- Abstract summary: We propose a rank-1 parameterization of BNNs, where each weight matrix involves only a distribution on a rank-1 subspace.
We also revisit the use of mixture approximate posteriors to capture multiple modes, where unlike typical mixtures, this approach admits a significantly smaller memory increase.
For ResNet-50 on ImageNet, Wide ResNet 28-10 on CIFAR-10/100, and an RNN on MIMIC-III, rank-1 BNNs achieve state-of-the-art performance across log-likelihood, accuracy, and calibration.
- Score: 36.56528603807598
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bayesian neural networks (BNNs) demonstrate promising success in improving
the robustness and uncertainty quantification of modern deep learning. However,
they generally struggle with underfitting at scale and parameter efficiency. On
the other hand, deep ensembles have emerged as alternatives for uncertainty
quantification that, while outperforming BNNs on certain problems, also suffer
from efficiency issues. It remains unclear how to combine the strengths of
these two approaches and remediate their common issues. To tackle this
challenge, we propose a rank-1 parameterization of BNNs, where each weight
matrix involves only a distribution on a rank-1 subspace. We also revisit the
use of mixture approximate posteriors to capture multiple modes, where unlike
typical mixtures, this approach admits a significantly smaller memory increase
(e.g., only a 0.4% increase for a ResNet-50 mixture of size 10). We perform a
systematic empirical study on the choices of prior, variational posterior, and
methods to improve training. For ResNet-50 on ImageNet, Wide ResNet 28-10 on
CIFAR-10/100, and an RNN on MIMIC-III, rank-1 BNNs achieve state-of-the-art
performance across log-likelihood, accuracy, and calibration on the test sets
and out-of-distribution variants.
Related papers
- Binary domain generalization for sparsifying binary neural networks [3.2462411268263964]
Binary neural networks (BNNs) are an attractive solution for developing and deploying deep neural network (DNN)-based applications in resource constrained devices.
Weight pruning of BNNs leads to performance degradation, which suggests that the standard binarization domain of BNNs is not well adapted for the task.
This work proposes a novel more general binary domain that extends the standard binary one that is more robust to pruning techniques.
arXiv Detail & Related papers (2023-06-23T14:32:16Z) - DIFUSCO: Graph-based Diffusion Solvers for Combinatorial Optimization [51.517956081644186]
We introduce a new graph-based diffusion framework, namely DIFUSCO.
Our framework casts NPC problems as discrete 0, 1-vector optimization problems.
For the MIS problem, DIFUSCO outperforms the previous state-of-the-art neural solver on the challenging SATLIB benchmark.
arXiv Detail & Related papers (2023-02-16T11:13:36Z) - Partial Binarization of Neural Networks for Budget-Aware Efficient
Learning [10.613066533991292]
Binarization is a powerful compression technique for neural networks.
We propose a controlled approach to partial binarization, creating a budgeted binary neural network (B2NN) with our MixBin strategy.
arXiv Detail & Related papers (2022-11-12T20:30:38Z) - Elastic-Link for Binarized Neural Network [9.83865304744923]
"Elastic-Link" (EL) module enrich information flow within a BNN by adaptively adding real-valued input features to the subsequent convolutional output features.
EL produces a significant improvement on the challenging large-scale ImageNet dataset.
With the integration of ReActNet, it yields a new state-of-the-art result of 71.9% top-1 accuracy.
arXiv Detail & Related papers (2021-12-19T13:49:29Z) - Spatial-Temporal-Fusion BNN: Variational Bayesian Feature Layer [77.78479877473899]
We design a spatial-temporal-fusion BNN for efficiently scaling BNNs to large models.
Compared to vanilla BNNs, our approach can greatly reduce the training time and the number of parameters, which contributes to scale BNNs efficiently.
arXiv Detail & Related papers (2021-12-12T17:13:14Z) - Boost Neural Networks by Checkpoints [9.411567653599358]
We propose a novel method to ensemble the checkpoints of deep neural networks (DNNs)
With the same training budget, our method achieves 4.16% lower error on Cifar-100 and 6.96% on Tiny-ImageNet with ResNet-110 architecture.
arXiv Detail & Related papers (2021-10-03T09:14:15Z) - Compact representations of convolutional neural networks via weight
pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization.
We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z) - S2-BNN: Bridging the Gap Between Self-Supervised Real and 1-bit Neural
Networks via Guided Distribution Calibration [74.5509794733707]
We present a novel guided learning paradigm from real-valued to distill binary networks on the final prediction distribution.
Our proposed method can boost the simple contrastive learning baseline by an absolute gain of 5.515% on BNNs.
Our method achieves substantial improvement over the simple contrastive learning baseline, and is even comparable to many mainstream supervised BNN methods.
arXiv Detail & Related papers (2021-02-17T18:59:28Z) - Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters.
Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques.
We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.