Related papers: Variance-Based Pruning for Accelerating and Compressing Trained Networks

Variance-Based Pruning for Accelerating and Compressing Trained Networks

URL: http://arxiv.org/abs/2507.12988v1
Date: Thu, 17 Jul 2025 10:54:17 GMT
Title: Variance-Based Pruning for Accelerating and Compressing Trained Networks
Authors: Uranik Berisha, Jens Mehnert, Alexandru Paul Condurache,
Abstract summary: Variance-Based Pruning is a simple and structured one-shot pruning technique for efficiently compressing networks.<n>On ImageNet-1k recognition tasks, we demonstrate that directly after pruning DeiT-Base retains over 70% of its original performance.
Score: 46.498278084317704
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Increasingly expensive training of ever larger models such as Vision Transfomers motivate reusing the vast library of already trained state-of-the-art networks. However, their latency, high computational costs and memory demands pose significant challenges for deployment, especially on resource-constrained hardware. While structured pruning methods can reduce these factors, they often require costly retraining, sometimes for up to hundreds of epochs, or even training from scratch to recover the lost accuracy resulting from the structural modifications. Maintaining the provided performance of trained models after structured pruning and thereby avoiding extensive retraining remains a challenge. To solve this, we introduce Variance-Based Pruning, a simple and structured one-shot pruning technique for efficiently compressing networks, with minimal finetuning. Our approach first gathers activation statistics, which are used to select neurons for pruning. Simultaneously the mean activations are integrated back into the model to preserve a high degree of performance. On ImageNet-1k recognition tasks, we demonstrate that directly after pruning DeiT-Base retains over 70% of its original performance and requires only 10 epochs of fine-tuning to regain 99% of the original accuracy while simultaneously reducing MACs by 35% and model size by 36%, thus speeding up the model by 1.44x.

Related papers

Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch [72.26822499434446]
Auto-Train-Once (ATO) is an innovative network pruning algorithm designed to automatically reduce the computational and storage costs of DNNs. We provide a comprehensive convergence analysis as well as extensive experiments, and the results show that our approach achieves state-of-the-art performance across various model architectures.
arXiv Detail & Related papers (2024-03-21T02:33:37Z)
Accelerating Deep Neural Networks via Semi-Structured Activation Sparsity [0.0]
Exploiting sparsity in the network's feature maps is one of the ways to reduce its inference latency. We propose a solution to induce semi-structured activation sparsity exploitable through minor runtime modifications. Our approach yields a speed improvement of $1.25 times$ with a minimal accuracy drop of $1.1%$ for the ResNet18 model on the ImageNet dataset.
arXiv Detail & Related papers (2023-09-12T22:28:53Z)
Getting More Juice Out of Your Data: Hard Pair Refinement Enhances Visual-Language Models Without Extra Data [122.282521548393]
Contrastive Language-Image Pre-training (CLIP) has become the standard for cross-modal image-text representation learning.<n>We introduce HELIP, a cost-effective strategy that improves CLIP models by exploiting challenging text-image pairs within existing datasets in continuous training.
arXiv Detail & Related papers (2023-05-09T07:00:17Z)
Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures. This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead. We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z)
Pruning with Compensation: Efficient Channel Pruning for Deep Convolutional Neural Networks [0.9712140341805068]
A highly efficient pruning method is proposed to significantly reduce the cost of pruning DCNN. Our method shows competitive pruning performance among the state-of-the-art retraining-based pruning methods.
arXiv Detail & Related papers (2021-08-31T10:17:36Z)
Fast Certified Robust Training via Better Initialization and Shorter Warmup [95.81628508228623]
We propose a new IBP and principled regularizers during the warmup stage to stabilize certified bounds. We find that batch normalization (BN) is a crucial architectural element to build best-performing networks for certified training.
arXiv Detail & Related papers (2021-03-31T17:58:58Z)
Enabling Retrain-free Deep Neural Network Pruning using Surrogate Lagrangian Relaxation [2.691929135895278]
We develop a systematic weight-pruning optimization approach based on Surrogate Lagrangian relaxation ( SLR) SLR achieves higher compression rate than state-of-the-arts under the same accuracy requirement. Given a limited budget of retraining epochs, our approach quickly recovers the model accuracy.
arXiv Detail & Related papers (2020-12-18T07:17:30Z)
Rapid Structural Pruning of Neural Networks with Set-based Task-Adaptive Meta-Pruning [83.59005356327103]
A common limitation of most existing pruning techniques is that they require pre-training of the network at least once before pruning. We propose STAMP, which task-adaptively prunes a network pretrained on a large reference dataset by generating a pruning mask on it as a function of the target dataset. We validate STAMP against recent advanced pruning methods on benchmark datasets.
arXiv Detail & Related papers (2020-06-22T10:57:43Z)
Dynamic Model Pruning with Feedback [64.019079257231]
We propose a novel model compression method that generates a sparse trained model without additional overhead. We evaluate our method on CIFAR-10 and ImageNet, and show that the obtained sparse models can reach the state-of-the-art performance of dense models.
arXiv Detail & Related papers (2020-06-12T15:07:08Z)
Pruning Filters while Training for Efficiently Optimizing Deep Learning Networks [6.269700080380206]
Pruning techniques have been proposed that remove less significant weights in deep networks. We propose a dynamic pruning-while-training procedure, wherein we prune filters of a deep network during training itself. Results indicate that pruning while training yields a compressed network with almost no accuracy loss after pruning 50% of the filters.
arXiv Detail & Related papers (2020-03-05T18:05:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.