Bayesian sparsification for deep neural networks with Bayesian model
reduction
- URL: http://arxiv.org/abs/2309.12095v2
- Date: Fri, 27 Oct 2023 07:00:04 GMT
- Title: Bayesian sparsification for deep neural networks with Bayesian model
reduction
- Authors: Dimitrije Markovi\'c, Karl J. Friston, and Stefan J. Kiebel
- Abstract summary: We advocate for the use of Bayesian model reduction (BMR) as a more efficient alternative for pruning of model weights.
BMR allows a post-hoc elimination of redundant model weights based on the posterior estimates under a straightforward (non-hierarchical) generative model.
We illustrate the potential of BMR across various deep learning architectures, from classical networks like LeNet to modern frameworks such as Vision and Transformers-Mixers.
- Score: 0.6144680854063939
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning's immense capabilities are often constrained by the complexity
of its models, leading to an increasing demand for effective sparsification
techniques. Bayesian sparsification for deep learning emerges as a crucial
approach, facilitating the design of models that are both computationally
efficient and competitive in terms of performance across various deep learning
applications. The state-of-the-art -- in Bayesian sparsification of deep neural
networks -- combines structural shrinkage priors on model weights with an
approximate inference scheme based on stochastic variational inference.
However, model inversion of the full generative model is exceptionally
computationally demanding, especially when compared to standard deep learning
of point estimates. In this context, we advocate for the use of Bayesian model
reduction (BMR) as a more efficient alternative for pruning of model weights.
As a generalization of the Savage-Dickey ratio, BMR allows a post-hoc
elimination of redundant model weights based on the posterior estimates under a
straightforward (non-hierarchical) generative model. Our comparative study
highlights the advantages of the BMR method relative to established approaches
based on hierarchical horseshoe priors over model weights. We illustrate the
potential of BMR across various deep learning architectures, from classical
networks like LeNet to modern frameworks such as Vision Transformers and
MLP-Mixers.
Related papers
- Supervised Score-Based Modeling by Gradient Boosting [49.556736252628745]
We propose a Supervised Score-based Model (SSM) which can be viewed as a gradient boosting algorithm combining score matching.
We provide a theoretical analysis of learning and sampling for SSM to balance inference time and prediction accuracy.
Our model outperforms existing models in both accuracy and inference time.
arXiv Detail & Related papers (2024-11-02T07:06:53Z) - BEND: Bagging Deep Learning Training Based on Efficient Neural Network Diffusion [56.9358325168226]
We propose a Bagging deep learning training algorithm based on Efficient Neural network Diffusion (BEND)
Our approach is simple but effective, first using multiple trained model weights and biases as inputs to train autoencoder and latent diffusion model.
Our proposed BEND algorithm can consistently outperform the mean and median accuracies of both the original trained model and the diffused model.
arXiv Detail & Related papers (2024-03-23T08:40:38Z) - Recurrent Reinforcement Learning with Memoroids [11.302674177386383]
We study memory models such as Recurrent Neural Networks (RNNs) and Transformers, by mapping trajectories to latent Markov states.
Neither model scales particularly well to long sequences, especially compared to an emerging class of memory models called Linear Recurrent Models.
We reformulate existing models using a novel monoid-based framework that we call memoroids.
arXiv Detail & Related papers (2024-02-15T11:56:53Z) - Sparse MoEs meet Efficient Ensembles [49.313497379189315]
We study the interplay of two popular classes of such models: ensembles of neural networks and sparse mixture of experts (sparse MoEs)
We present Efficient Ensemble of Experts (E$3$), a scalable and simple ensemble of sparse MoEs that takes the best of both classes of models, while using up to 45% fewer FLOPs than a deep ensemble.
arXiv Detail & Related papers (2021-10-07T11:58:35Z) - Distributional Depth-Based Estimation of Object Articulation Models [21.046351215949525]
We propose a method that efficiently learns distributions over articulation model parameters directly from depth images.
Our core contributions include a novel representation for distributions over rigid body transformations.
We introduce a novel deep learning based approach, DUST-net, that performs category-independent articulation model estimation.
arXiv Detail & Related papers (2021-08-12T17:44:51Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors.
We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method.
Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z) - Be Your Own Best Competitor! Multi-Branched Adversarial Knowledge
Transfer [15.499267533387039]
The proposed method has been devoted to both lightweight image classification and encoder-decoder architectures to boost the performance of small and compact models without incurring extra computational overhead at the inference process.
The obtained results show that the proposed model has achieved significant improvement over earlier ideas of self-distillation methods.
arXiv Detail & Related papers (2020-10-09T11:57:45Z) - Learning Deep-Latent Hierarchies by Stacking Wasserstein Autoencoders [22.54887526392739]
We propose a novel approach to training models with deep-latent hierarchies based on Optimal Transport.
We show that our method enables the generative model to fully leverage its deep-latent hierarchy, avoiding the well known "latent variable collapse" issue of VAEs.
arXiv Detail & Related papers (2020-10-07T15:04:20Z) - Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems [83.98774574197613]
We take one of the simplest inference methods, a truncated max-product Belief propagation, and add what is necessary to make it a proper component of a deep learning model.
This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs)
The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.
arXiv Detail & Related papers (2020-03-13T13:11:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.