Related papers: C-SWAP: Explainability-Aware Structured Pruning for Efficient Neural Networks Compression

C-SWAP: Explainability-Aware Structured Pruning for Efficient Neural Networks Compression

URL: http://arxiv.org/abs/2510.18636v1
Date: Tue, 21 Oct 2025 13:40:11 GMT
Title: C-SWAP: Explainability-Aware Structured Pruning for Efficient Neural Networks Compression
Authors: Baptiste Bauvin, Loïc Baret, Ola Ahmad,
Abstract summary: Pruning is a widely used technique that prompts sparsity in model structures.<n>We propose a novel one-shot pruning framework that relies on explainable deep learning.<n>Our method consistently achieves substantial reductions in model size, with minimal impact on performance, and without the need for fine-tuning.
Score: 4.10373648742522
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Neural network compression has gained increasing attention in recent years, particularly in computer vision applications, where the need for model reduction is crucial for overcoming deployment constraints. Pruning is a widely used technique that prompts sparsity in model structures, e.g. weights, neurons, and layers, reducing size and inference costs. Structured pruning is especially important as it allows for the removal of entire structures, which further accelerates inference time and reduces memory overhead. However, it can be computationally expensive, requiring iterative retraining and optimization. To overcome this problem, recent methods considered one-shot setting, which applies pruning directly at post-training. Unfortunately, they often lead to a considerable drop in performance. In this paper, we focus on this issue by proposing a novel one-shot pruning framework that relies on explainable deep learning. First, we introduce a causal-aware pruning approach that leverages cause-effect relations between model predictions and structures in a progressive pruning process. It allows us to efficiently reduce the size of the network, ensuring that the removed structures do not deter the performance of the model. Then, through experiments conducted on convolution neural network and vision transformer baselines, pre-trained on classification tasks, we demonstrate that our method consistently achieves substantial reductions in model size, with minimal impact on performance, and without the need for fine-tuning. Overall, our approach outperforms its counterparts, offering the best trade-off. Our code is available on GitHub.

Related papers

The Key to State Reduction in Linear Attention: A Rank-based Perspective [8.006873922525275]
Recent empirical results indicate that the hidden state of trained linear attention models often exhibits a low-rank structure.<n>We provide a theoretical analysis of the role of rank in linear attention, revealing that low effective rank can affect retrieval error by amplifying query noise.<n>In addition to these theoretical insights, we conjecture that the low-rank states can be substantially reduced post-training.
arXiv Detail & Related papers (2026-02-04T18:39:38Z)
Gradually Compacting Large Language Models for Reasoning Like a Boiling Frog [72.4168434368873]
Large Language Models (LLMs) have demonstrated impressive reasoning capabilities, but their substantial size often demands significant computational resources.<n>We propose a gradual compacting method that divides the compression process into multiple fine-grained iterations.<n>This iterative approach-reminiscent of the "boiling frog" effect-enables the model to be progressively compressed without abrupt performance loss.
arXiv Detail & Related papers (2026-02-04T06:56:52Z)
Pruning Everything, Everywhere, All at Once [1.7811840395202343]
Pruning structures in deep learning models efficiently reduces model complexity and improves computational efficiency.<n>We propose a new method capable of pruning different structures within a model as follows.<n>Iteratively repeating this process provides highly sparse models that preserve the original predictive ability.
arXiv Detail & Related papers (2025-06-04T23:34:28Z)
An Efficient Sparse Fine-Tuning with Low Quantization Error via Neural Network Pruning [9.208007322096535]
We develop a new SpFT framework, based on ideas from neural network pruning.<n>We show our method improves SpFT's memory efficiency by 20-50% while matching the accuracy of state-of-the-art methods like LoRA's variants.
arXiv Detail & Related papers (2025-02-17T04:54:42Z)
Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures. This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead. We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z)
Neural Network Compression by Joint Sparsity Promotion and Redundancy Reduction [4.9613162734482215]
This paper presents a novel training scheme based on composite constraints that prune redundant filters and minimize their effect on overall network learning via sparsity promotion. Our tests on several pixel-wise segmentation benchmarks show that the number of neurons and the memory footprint of networks in the test phase are significantly reduced without affecting performance.
arXiv Detail & Related papers (2022-10-14T01:34:49Z)
Slimmable Networks for Contrastive Self-supervised Learning [69.9454691873866]
Self-supervised learning makes significant progress in pre-training large models, but struggles with small models. We introduce another one-stage solution to obtain pre-trained small models without the need for extra teachers. A slimmable network consists of a full network and several weight-sharing sub-networks, which can be pre-trained once to obtain various networks.
arXiv Detail & Related papers (2022-09-30T15:15:05Z)
Neural Networks Reduction via Lumping [0.0]
A large number of solutions has been published to reduce both the number of operations and the parameters involved with the models. Most of these reducing techniques are actually methods and usually require at least one re-training step to recover the accuracy. We propose a pruning approach that reduces the number of neurons in a network without using any data or fine-tuning, while completely preserving the exact behaviour.
arXiv Detail & Related papers (2022-09-15T17:13:07Z)
Powerpropagation: A sparsity inducing weight reparameterisation [65.85142037667065]
We introduce Powerpropagation, a new weight- parameterisation for neural networks that leads to inherently sparse models. Models trained in this manner exhibit similar performance, but have a distribution with markedly higher density at zero, allowing more parameters to be pruned safely. Here, we combine Powerpropagation with a traditional weight-pruning technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing superior performance on the ImageNet benchmark.
arXiv Detail & Related papers (2021-10-01T10:03:57Z)
Efficient Micro-Structured Weight Unification and Pruning for Neural Network Compression [56.83861738731913]
Deep Neural Network (DNN) models are essential for practical applications, especially for resource limited devices. Previous unstructured or structured weight pruning methods can hardly truly accelerate inference. We propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration.
arXiv Detail & Related papers (2021-06-15T17:22:59Z)
Neural Pruning via Growing Regularization [82.9322109208353]
We extend regularization to tackle two central problems of pruning: pruning schedule and weight importance scoring. Specifically, we propose an L2 regularization variant with rising penalty factors and show it can bring significant accuracy gains. The proposed algorithms are easy to implement and scalable to large datasets and networks in both structured and unstructured pruning.
arXiv Detail & Related papers (2020-12-16T20:16:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.