Fire Together Wire Together: A Dynamic Pruning Approach with
Self-Supervised Mask Prediction
- URL: http://arxiv.org/abs/2110.08232v1
- Date: Fri, 15 Oct 2021 17:39:53 GMT
- Title: Fire Together Wire Together: A Dynamic Pruning Approach with
Self-Supervised Mask Prediction
- Authors: Sara Elkerdawy, Mostafa Elhoushi, Hong Zhang, Nilanjan Ray
- Abstract summary: Dynamic model pruning is a recent direction that allows for the inference of a different sub-network for each input sample during deployment.
Current dynamic methods rely on learning a continuous channel gating through regularization by inducing sparsity loss.
We show experiments on several neural architectures, such as VGG, ResNet, and MobileNet on CIFAR and ImageNet.
- Score: 12.86325214182021
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Dynamic model pruning is a recent direction that allows for the inference of
a different sub-network for each input sample during deployment. However,
current dynamic methods rely on learning a continuous channel gating through
regularization by inducing sparsity loss. This formulation introduces
complexity in balancing different losses (e.g task loss, regularization loss).
In addition, regularization-based methods lack transparent tradeoff
hyperparameter selection to realize computational budget. Our contribution is
twofold: 1) decoupled task and pruning training. 2) Simple hyperparameter
selection that enables FLOPs reduction estimation before training. We propose
to predict a mask to process k filters in a layer based on the activation of
its previous layer. We pose the problem as a self-supervised binary
classification problem. Each mask predictor module is trained to predict if the
log-likelihood of each filter in the current layer belongs to the top-k
activated filters. The value k is dynamically estimated for each input based on
a novel criterion using the mass of heatmaps. We show experiments on several
neural architectures, such as VGG, ResNet, and MobileNet on CIFAR and ImageNet
datasets. On CIFAR, we reach similar accuracy to SOTA methods with 15% and 24%
higher FLOPs reduction. Similarly in ImageNet, we achieve a lower drop in
accuracy with up to 13% improvement in FLOPs reduction.
Related papers
- BALI: Learning Neural Networks via Bayesian Layerwise Inference [6.7819070167076045]
We introduce a new method for learning Bayesian neural networks, treating them as a stack of multivariate Bayesian linear regression models.
The main idea is to infer the layerwise posterior exactly if we know the target outputs of each layer.
We define these pseudo-targets as the layer outputs from the forward pass, updated by the backpropagated of the objective function.
arXiv Detail & Related papers (2024-11-18T22:18:34Z) - ConvLoRA and AdaBN based Domain Adaptation via Self-Training [4.006331916849688]
We propose Convolutional Low-Rank Adaptation (ConvLoRA) for multi-target domain adaptation.
ConvLoRA freezes pre-trained model weights, adds trainable low-rank decomposition matrices to convolutional layers, and backpropagates the gradient.
Our method has fewer trainable parameters and performs better or on-par with large independent fine-tuned networks.
arXiv Detail & Related papers (2024-02-07T15:43:50Z) - Filter Pruning for Efficient CNNs via Knowledge-driven Differential
Filter Sampler [103.97487121678276]
Filter pruning simultaneously accelerates the computation and reduces the memory overhead of CNNs.
We propose a novel Knowledge-driven Differential Filter Sampler(KDFS) with Masked Filter Modeling(MFM) framework for filter pruning.
arXiv Detail & Related papers (2023-07-01T02:28:41Z) - Learning a Consensus Sub-Network with Polarization Regularization and
One Pass Training [3.2214522506924093]
Pruning schemes create extra overhead either by iterative training and fine-tuning for static pruning or repeated computation of a dynamic pruning graph.
We propose a new parameter pruning strategy for learning a lighter-weight sub-network that minimizes the energy cost while maintaining comparable performance to the fully parameterised network on given downstream tasks.
Our results on CIFAR-10 and CIFAR-100 suggest that our scheme can remove 50% of connections in deep networks with less than 1% reduction in classification accuracy.
arXiv Detail & Related papers (2023-02-17T09:37:17Z) - Co-training $2^L$ Submodels for Visual Recognition [67.02999567435626]
Submodel co-training is a regularization method related to co-training, self-distillation and depth.
We show that submodel co-training is effective to train backbones for recognition tasks such as image classification and semantic segmentation.
arXiv Detail & Related papers (2022-12-09T14:38:09Z) - Transformers meet Stochastic Block Models: Attention with Data-Adaptive
Sparsity and Cost [53.746169882193456]
Recent works have proposed various sparse attention modules to overcome the quadratic cost of self-attention.
We propose a model that resolves both problems by endowing each attention head with a mixed-membership Block Model.
Our model outperforms previous efficient variants as well as the original Transformer with full attention.
arXiv Detail & Related papers (2022-10-27T15:30:52Z) - Federated Learning Using Variance Reduced Stochastic Gradient for
Probabilistically Activated Agents [0.0]
This paper proposes an algorithm for Federated Learning (FL) with a two-layer structure that achieves both variance reduction and a faster convergence rate to an optimal solution in the setting where each agent has an arbitrary probability of selection in each iteration.
arXiv Detail & Related papers (2022-10-25T22:04:49Z) - Distribution Mismatch Correction for Improved Robustness in Deep Neural
Networks [86.42889611784855]
normalization methods increase the vulnerability with respect to noise and input corruptions.
We propose an unsupervised non-parametric distribution correction method that adapts the activation distribution of each layer.
In our experiments, we empirically show that the proposed method effectively reduces the impact of intense image corruptions.
arXiv Detail & Related papers (2021-10-05T11:36:25Z) - Preprint: Norm Loss: An efficient yet effective regularization method
for deep neural networks [7.214681039134488]
We propose a weight soft-regularization method based on the oblique manifold.
We evaluate our method on the popular CIFAR-10, CIFAR-100 and ImageNet 2012 datasets.
arXiv Detail & Related papers (2021-03-11T10:24:49Z) - Manifold Regularized Dynamic Network Pruning [102.24146031250034]
This paper proposes a new paradigm that dynamically removes redundant filters by embedding the manifold information of all instances into the space of pruned networks.
The effectiveness of the proposed method is verified on several benchmarks, which shows better performance in terms of both accuracy and computational cost.
arXiv Detail & Related papers (2021-03-10T03:59:03Z) - On the Reproducibility of Neural Network Predictions [52.47827424679645]
We study the problem of churn, identify factors that cause it, and propose two simple means of mitigating it.
We first demonstrate that churn is indeed an issue, even for standard image classification tasks.
We propose using emphminimum entropy regularizers to increase prediction confidences.
We present empirical results showing the effectiveness of both techniques in reducing churn while improving the accuracy of the underlying model.
arXiv Detail & Related papers (2021-02-05T18:51:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.