Efficient Stein Variational Inference for Reliable Distribution-lossless
Network Pruning
- URL: http://arxiv.org/abs/2212.03537v1
- Date: Wed, 7 Dec 2022 09:31:47 GMT
- Title: Efficient Stein Variational Inference for Reliable Distribution-lossless
Network Pruning
- Authors: Yingchun Wang, Song Guo, Jingcai Guo, Weizhan Zhang, Yida Xu, Jie
Zhang, Yi Liu
- Abstract summary: We propose a novel distribution-lossless pruning method, named vanillaP, to theoretically find the pruned lottery within Bayesian treatment.
Our method can obtain sparser networks with great performance while providing quantified reliability for the pruned model.
- Score: 23.22021752821507
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Network pruning is a promising way to generate light but accurate models and
enable their deployment on resource-limited edge devices. However, the current
state-of-the-art assumes that the effective sub-network and the other
superfluous parameters in the given network share the same distribution, where
pruning inevitably involves a distribution truncation operation. They usually
eliminate values near zero. While simple, it may not be the most appropriate
method, as effective models may naturally have many small values associated
with them. Removing near-zero values already embedded in model space may
significantly reduce model accuracy. Another line of work has proposed to
assign discrete prior over all possible sub-structures that still rely on
human-crafted prior hypotheses. Worse still, existing methods use regularized
point estimates, namely Hard Pruning, that can not provide error estimations
and fail reliability justification for the pruned networks. In this paper, we
propose a novel distribution-lossless pruning method, named DLLP, to
theoretically find the pruned lottery within Bayesian treatment. Specifically,
DLLP remodels the vanilla networks as discrete priors for the latent pruned
model and the other redundancy. More importantly, DLLP uses Stein Variational
Inference to approach the latent prior and effectively bypasses calculating KL
divergence with unknown distribution. Extensive experiments based on small
Cifar-10 and large-scaled ImageNet demonstrate that our method can obtain
sparser networks with great generalization performance while providing
quantified reliability for the pruned model.
Related papers
- Flexible Heteroscedastic Count Regression with Deep Double Poisson Networks [4.58556584533865]
We propose the Deep Double Poisson Network (DDPN) to produce accurate, input-conditional uncertainty representations.
DDPN vastly outperforms existing discrete models.
It can be applied to a variety of count regression datasets.
arXiv Detail & Related papers (2024-06-13T16:02:03Z) - Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions.
We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance.
Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z) - Bayesian Flow Networks [4.585102332532472]
This paper introduces Bayesian Flow Networks (BFNs), a new class of generative model in which the parameters of a set of independent distributions are modified with Bayesian inference.
Starting from a simple prior and iteratively updating the two distributions yields a generative procedure similar to the reverse process of diffusion models.
BFNs achieve competitive log-likelihoods for image modelling on dynamically binarized MNIST and CIFAR-10, and outperform all known discrete diffusion models on the text8 character-level language modelling task.
arXiv Detail & Related papers (2023-08-14T09:56:35Z) - Quantifying lottery tickets under label noise: accuracy, calibration,
and complexity [6.232071870655069]
Pruning deep neural networks is a widely used strategy to alleviate the computational burden in machine learning.
We use the sparse double descent approach to identify univocally and characterise pruned models associated with classification tasks.
arXiv Detail & Related papers (2023-06-21T11:35:59Z) - Just One Byte (per gradient): A Note on Low-Bandwidth Decentralized
Language Model Finetuning Using Shared Randomness [86.61582747039053]
Language model training in distributed settings is limited by the communication cost of exchanges.
We extend recent work using shared randomness to perform distributed fine-tuning with low bandwidth.
arXiv Detail & Related papers (2023-06-16T17:59:51Z) - Interpretations Steered Network Pruning via Amortized Inferred Saliency
Maps [85.49020931411825]
Convolutional Neural Networks (CNNs) compression is crucial to deploying these models in edge devices with limited resources.
We propose to address the channel pruning problem from a novel perspective by leveraging the interpretations of a model to steer the pruning process.
We tackle this challenge by introducing a selector model that predicts real-time smooth saliency masks for pruned models.
arXiv Detail & Related papers (2022-09-07T01:12:11Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - KL Guided Domain Adaptation [88.19298405363452]
Domain adaptation is an important problem and often needed for real-world applications.
A common approach in the domain adaptation literature is to learn a representation of the input that has the same distributions over the source and the target domain.
We show that with a probabilistic representation network, the KL term can be estimated efficiently via minibatch samples.
arXiv Detail & Related papers (2021-06-14T22:24:23Z) - Robust Implicit Networks via Non-Euclidean Contractions [63.91638306025768]
Implicit neural networks show improved accuracy and significant reduction in memory consumption.
They can suffer from ill-posedness and convergence instability.
This paper provides a new framework to design well-posed and robust implicit neural networks.
arXiv Detail & Related papers (2021-06-06T18:05:02Z) - Lost in Pruning: The Effects of Pruning Neural Networks beyond Test
Accuracy [42.15969584135412]
Neural network pruning is a popular technique used to reduce the inference costs of modern networks.
We evaluate whether the use of test accuracy alone in the terminating condition is sufficient to ensure that the resulting model performs well.
We find that pruned networks effectively approximate the unpruned model, however, the prune ratio at which pruned networks achieve commensurate performance varies significantly across tasks.
arXiv Detail & Related papers (2021-03-04T13:22:16Z) - Achieving Efficiency in Black Box Simulation of Distribution Tails with
Self-structuring Importance Samplers [1.6114012813668934]
The paper presents a novel Importance Sampling (IS) scheme for estimating distribution of performance measures modeled with a rich set of tools such as linear programs, integer linear programs, piecewise linear/quadratic objectives, feature maps specified with deep neural networks, etc.
arXiv Detail & Related papers (2021-02-14T03:37:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.