Efficient Model Compression for Bayesian Neural Networks
- URL: http://arxiv.org/abs/2411.00273v1
- Date: Fri, 01 Nov 2024 00:07:59 GMT
- Title: Efficient Model Compression for Bayesian Neural Networks
- Authors: Diptarka Saha, Zihe Liu, Feng Liang,
- Abstract summary: We demonstrate a novel strategy to emulate principles of Bayesian model selection in a deep learning setup.
We employ these probabilities for pruning and feature selection on a host of simulated and real-world benchmark data.
- Score: 4.179545514579061
- License:
- Abstract: Model Compression has drawn much attention within the deep learning community recently. Compressing a dense neural network offers many advantages including lower computation cost, deployability to devices of limited storage and memories, and resistance to adversarial attacks. This may be achieved via weight pruning or fully discarding certain input features. Here we demonstrate a novel strategy to emulate principles of Bayesian model selection in a deep learning setup. Given a fully connected Bayesian neural network with spike-and-slab priors trained via a variational algorithm, we obtain the posterior inclusion probability for every node that typically gets lost. We employ these probabilities for pruning and feature selection on a host of simulated and real-world benchmark data and find evidence of better generalizability of the pruned model in all our experiments.
Related papers
- Pruning Neural Networks via Coresets and Convex Geometry: Towards No
Assumptions [10.635248457021499]
Pruning is one of the predominant approaches for compressing deep neural networks (DNNs)
We propose a novel and robust framework for computing such coresets under mild assumptions on the model's weights and inputs.
Our method outperforms existing coreset based neural pruning approaches across a wide range of networks and datasets.
arXiv Detail & Related papers (2022-09-18T12:45:26Z) - Look beyond labels: Incorporating functional summary information in
Bayesian neural networks [11.874130244353253]
We present a simple approach to incorporate summary information about the predicted probability.
The available summary information is incorporated as augmented data and modeled with a Dirichlet process.
We show how the method can inform the model about task difficulty or class imbalance.
arXiv Detail & Related papers (2022-07-04T07:06:45Z) - Split personalities in Bayesian Neural Networks: the case for full
marginalisation [0.0]
We show that the true posterior distribution of a Bayesian neural network is massively multimodal.
It is only by fully marginalising over all posterior modes, using appropriate Bayesian sampling tools, that we can capture the split personalities of the network.
arXiv Detail & Related papers (2022-05-23T09:24:37Z) - Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity
on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function.
We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z) - LCS: Learning Compressible Subspaces for Adaptive Network Compression at
Inference Time [57.52251547365967]
We propose a method for training a "compressible subspace" of neural networks that contains a fine-grained spectrum of models.
We present results for achieving arbitrarily fine-grained accuracy-efficiency trade-offs at inference time for structured and unstructured sparsity.
Our algorithm extends to quantization at variable bit widths, achieving accuracy on par with individually trained networks.
arXiv Detail & Related papers (2021-10-08T17:03:34Z) - Layer Adaptive Node Selection in Bayesian Neural Networks: Statistical
Guarantees and Implementation Details [0.5156484100374059]
Sparse deep neural networks have proven to be efficient for predictive model building in large-scale studies.
We propose a Bayesian sparse solution using spike-and-slab Gaussian priors to allow for node selection during training.
We establish the fundamental result of variational posterior consistency together with the characterization of prior parameters.
arXiv Detail & Related papers (2021-08-25T00:48:07Z) - Point-Cloud Deep Learning of Porous Media for Permeability Prediction [0.0]
We propose a novel deep learning framework for predicting permeability of porous media from their digital images.
We model the boundary between solid matrix and pore spaces as point clouds and feed them as inputs to a neural network based on the PointNet architecture.
arXiv Detail & Related papers (2021-07-18T22:59:21Z) - A Bayesian Perspective on Training Speed and Model Selection [51.15664724311443]
We show that a measure of a model's training speed can be used to estimate its marginal likelihood.
We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks.
Our results suggest a promising new direction towards explaining why neural networks trained with gradient descent are biased towards functions that generalize well.
arXiv Detail & Related papers (2020-10-27T17:56:14Z) - ESPN: Extremely Sparse Pruned Networks [50.436905934791035]
We show that a simple iterative mask discovery method can achieve state-of-the-art compression of very deep networks.
Our algorithm represents a hybrid approach between single shot network pruning methods and Lottery-Ticket type approaches.
arXiv Detail & Related papers (2020-06-28T23:09:27Z) - Beyond Dropout: Feature Map Distortion to Regularize Deep Neural
Networks [107.77595511218429]
In this paper, we investigate the empirical Rademacher complexity related to intermediate layers of deep neural networks.
We propose a feature distortion method (Disout) for addressing the aforementioned problem.
The superiority of the proposed feature map distortion for producing deep neural network with higher testing performance is analyzed and demonstrated.
arXiv Detail & Related papers (2020-02-23T13:59:13Z) - Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks.
We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.