Training Bayesian Neural Networks with Sparse Subspace Variational
Inference
- URL: http://arxiv.org/abs/2402.11025v1
- Date: Fri, 16 Feb 2024 19:15:49 GMT
- Title: Training Bayesian Neural Networks with Sparse Subspace Variational
Inference
- Authors: Junbo Li, Zichen Miao, Qiang Qiu, Ruqi Zhang
- Abstract summary: Sparse Subspace Variational Inference (SSVI) is the first fully sparse BNN framework that maintains a consistently highly sparse model throughout the training and inference phases.
Our experiments show that SSVI sets new benchmarks in crafting sparse BNNs, achieving, for instance, a 10-20x compression in model size with under 3% performance drop, and up to 20x FLOPs reduction during training compared with dense VI training.
- Score: 35.241207717307645
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bayesian neural networks (BNNs) offer uncertainty quantification but come
with the downside of substantially increased training and inference costs.
Sparse BNNs have been investigated for efficient inference, typically by either
slowly introducing sparsity throughout the training or by post-training
compression of dense BNNs. The dilemma of how to cut down massive training
costs remains, particularly given the requirement to learn about the
uncertainty. To solve this challenge, we introduce Sparse Subspace Variational
Inference (SSVI), the first fully sparse BNN framework that maintains a
consistently highly sparse Bayesian model throughout the training and inference
phases. Starting from a randomly initialized low-dimensional sparse subspace,
our approach alternately optimizes the sparse subspace basis selection and its
associated parameters. While basis selection is characterized as a
non-differentiable problem, we approximate the optimal solution with a
removal-and-addition strategy, guided by novel criteria based on weight
distribution statistics. Our extensive experiments show that SSVI sets new
benchmarks in crafting sparse BNNs, achieving, for instance, a 10-20x
compression in model size with under 3\% performance drop, and up to 20x FLOPs
reduction during training compared with dense VI training. Remarkably, SSVI
also demonstrates enhanced robustness to hyperparameters, reducing the need for
intricate tuning in VI and occasionally even surpassing VI-trained dense BNNs
on both accuracy and uncertainty metrics.
Related papers
- Towards Continual Learning Desiderata via HSIC-Bottleneck
Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion.
Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z) - Overcoming Recency Bias of Normalization Statistics in Continual
Learning: Balance and Adaptation [67.77048565738728]
Continual learning involves learning a sequence of tasks and balancing their knowledge appropriately.
We propose Adaptive Balance of BN (AdaB$2$N), which incorporates appropriately a Bayesian-based strategy to adapt task-wise contributions.
Our approach achieves significant performance gains across a wide range of benchmarks.
arXiv Detail & Related papers (2023-10-13T04:50:40Z) - Accurate Neural Network Pruning Requires Rethinking Sparse Optimization [87.90654868505518]
We show the impact of high sparsity on model training using the standard computer vision and natural language processing sparsity benchmarks.
We provide new approaches for mitigating this issue for both sparse pre-training of vision models and sparse fine-tuning of language models.
arXiv Detail & Related papers (2023-08-03T21:49:14Z) - Comparative Analysis of Interval Reachability for Robust Implicit and
Feedforward Neural Networks [64.23331120621118]
We use interval reachability analysis to obtain robustness guarantees for implicit neural networks (INNs)
INNs are a class of implicit learning models that use implicit equations as layers.
We show that our approach performs at least as well as, and generally better than, applying state-of-the-art interval bound propagation methods to INNs.
arXiv Detail & Related papers (2022-04-01T03:31:27Z) - Spatial-Temporal-Fusion BNN: Variational Bayesian Feature Layer [77.78479877473899]
We design a spatial-temporal-fusion BNN for efficiently scaling BNNs to large models.
Compared to vanilla BNNs, our approach can greatly reduce the training time and the number of parameters, which contributes to scale BNNs efficiently.
arXiv Detail & Related papers (2021-12-12T17:13:14Z) - Towards Low-Latency Energy-Efficient Deep SNNs via Attention-Guided
Compression [12.37129078618206]
Deep spiking neural networks (SNNs) have emerged as a potential alternative to traditional deep learning frameworks.
Most SNN training frameworks yield large inference latency which translates to increased spike activity and reduced energy efficiency.
This paper presents a non-iterative SNN training technique thatachieves ultra-high compression with reduced spiking activity.
arXiv Detail & Related papers (2021-07-16T18:23:36Z) - S2-BNN: Bridging the Gap Between Self-Supervised Real and 1-bit Neural
Networks via Guided Distribution Calibration [74.5509794733707]
We present a novel guided learning paradigm from real-valued to distill binary networks on the final prediction distribution.
Our proposed method can boost the simple contrastive learning baseline by an absolute gain of 5.515% on BNNs.
Our method achieves substantial improvement over the simple contrastive learning baseline, and is even comparable to many mainstream supervised BNN methods.
arXiv Detail & Related papers (2021-02-17T18:59:28Z) - Selfish Sparse RNN Training [13.165729746380816]
We propose an approach to train sparse RNNs with a fixed parameter count in one single run, without compromising performance.
We achieve state-of-the-art sparse training results with various datasets on Penn TreeBank and Wikitext-2.
arXiv Detail & Related papers (2021-01-22T10:45:40Z) - Encoding the latent posterior of Bayesian Neural Networks for
uncertainty quantification [10.727102755903616]
We aim for efficient deep BNNs amenable to complex computer vision architectures.
We achieve this by leveraging variational autoencoders (VAEs) to learn the interaction and the latent distribution of the parameters at each network layer.
Our approach, Latent-Posterior BNN (LP-BNN), is compatible with the recent BatchEnsemble method, leading to highly efficient (in terms of computation and memory during both training and testing) ensembles.
arXiv Detail & Related papers (2020-12-04T19:50:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.