Related papers: Deep Neural Compression Via Concurrent Pruning and Self-Distillation

Deep Neural Compression Via Concurrent Pruning and Self-Distillation

URL: http://arxiv.org/abs/2109.15014v1
Date: Thu, 30 Sep 2021 11:08:30 GMT
Title: Deep Neural Compression Via Concurrent Pruning and Self-Distillation
Authors: James O' Neill, Sourav Dutta, Haytham Assem
Abstract summary: Pruning aims to reduce the number of parameters while maintaining performance close to the original network. This work proposes a novel emphself-distillation based pruning strategy. We show that the proposed em cross-correlation objective for self-distilled pruning implicitly encourages sparse solutions.
Score: 7.448510589632587
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Pruning aims to reduce the number of parameters while maintaining performance close to the original network. This work proposes a novel \emph{self-distillation} based pruning strategy, whereby the representational similarity between the pruned and unpruned versions of the same network is maximized. Unlike previous approaches that treat distillation and pruning separately, we use distillation to inform the pruning criteria, without requiring a separate student network as in knowledge distillation. We show that the proposed {\em cross-correlation objective for self-distilled pruning} implicitly encourages sparse solutions, naturally complementing magnitude-based pruning criteria. Experiments on the GLUE and XGLUE benchmarks show that self-distilled pruning increases mono- and cross-lingual language model performance. Self-distilled pruned models also outperform smaller Transformers with an equal number of parameters and are competitive against (6 times) larger distilled networks. We also observe that self-distillation (1) maximizes class separability, (2) increases the signal-to-noise ratio, and (3) converges faster after pruning steps, providing further insights into why self-distilled pruning improves generalization.

Related papers

Pruning Everything, Everywhere, All at Once [1.7811840395202343]
Pruning structures in deep learning models efficiently reduces model complexity and improves computational efficiency.<n>We propose a new method capable of pruning different structures within a model as follows.<n>Iteratively repeating this process provides highly sparse models that preserve the original predictive ability.
arXiv Detail & Related papers (2025-06-04T23:34:28Z)
Isomorphic Pruning for Vision Models [56.286064975443026]
Structured pruning reduces the computational overhead of deep neural networks by removing redundant sub-structures. We present Isomorphic Pruning, a simple approach that demonstrates effectiveness across a range of network architectures.
arXiv Detail & Related papers (2024-07-05T16:14:53Z)
Theoretical Characterization of How Neural Network Pruning Affects its Generalization [131.1347309639727]
This work makes the first attempt to study how different pruning fractions affect the model's gradient descent dynamics and generalization. It is shown that as long as the pruning fraction is below a certain threshold, gradient descent can drive the training loss toward zero. More surprisingly, the generalization bound gets better as the pruning fraction gets larger.
arXiv Detail & Related papers (2023-01-01T03:10:45Z)
Gradient-based Intra-attention Pruning on Pre-trained Language Models [21.444503777215637]
We propose a structured pruning method GRAIN (Gradient-based Intra-attention pruning) GRAIN inspects and prunes intra-attention structures, which greatly expands the structure search space and enables more flexible models. Experiments on GLUE, SQuAD, and CoNLL 2003 show that GRAIN notably outperforms other methods, especially in the high sparsity regime.
arXiv Detail & Related papers (2022-12-15T06:52:31Z)
Interpretations Steered Network Pruning via Amortized Inferred Saliency Maps [85.49020931411825]
Convolutional Neural Networks (CNNs) compression is crucial to deploying these models in edge devices with limited resources. We propose to address the channel pruning problem from a novel perspective by leveraging the interpretations of a model to steer the pruning process. We tackle this challenge by introducing a selector model that predicts real-time smooth saliency masks for pruned models.
arXiv Detail & Related papers (2022-09-07T01:12:11Z)
Structured Pruning Learns Compact and Accurate Models [28.54826400747667]
We propose a task-specific structured pruning method CoFi (Coarse- and Fine-grained Pruning) CoFi delivers highly parallelizableworks and matches the distillation methods in both accuracy and latency. Our experiments on GLUE and SQuAD datasets show that CoFi yields models with over 10x speedups with a small accuracy drop.
arXiv Detail & Related papers (2022-04-01T13:09:56Z)
Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetune Paradigm [7.662952656290564]
Various pruning approaches have been proposed to reduce the footprint requirements of Transformer-based language models. We show for the first time that reducing the risk of overfitting can help the effectiveness of pruning under the pretrain-and-finetune paradigm.
arXiv Detail & Related papers (2021-10-15T16:42:56Z)
Sparse Training via Boosting Pruning Plasticity with Neuroregeneration [79.78184026678659]
We study the effect of pruning throughout training from the perspective of pruning plasticity. We design a novel gradual magnitude pruning (GMP) method, named gradual pruning with zero-cost neuroregeneration (GraNet) and its dynamic sparse training (DST) variant (GraNet-ST) Perhaps most impressively, the latter for the first time boosts the sparse-to-sparse training performance over various dense-to-sparse methods by a large margin with ResNet-50 on ImageNet.
arXiv Detail & Related papers (2021-06-19T02:09:25Z)
MLPruning: A Multilevel Structured Pruning Framework for Transformer-based Models [78.45898846056303]
Pruning is an effective method to reduce the memory footprint and computational cost associated with large natural language processing models. We develop a novel MultiLevel structured Pruning framework, which uses three different levels of structured pruning: head pruning, row pruning, and block-wise sparse pruning.
arXiv Detail & Related papers (2021-05-30T22:00:44Z)
Even your Teacher Needs Guidance: Ground-Truth Targets Dampen Regularization Imposed by Self-Distillation [0.0]
Self-distillation, where the network architectures are identical, has been observed to improve generalization accuracy. We consider an iterative variant of self-distillation in a kernel regression setting, in which successive steps incorporate both model outputs and the ground-truth targets. We show that any such function obtained with self-distillation can be calculated directly as a function of the initial fit, and that infinite distillation steps yields the same optimization problem as the original with amplified regularization.
arXiv Detail & Related papers (2021-02-25T18:56:09Z)
Neural Pruning via Growing Regularization [82.9322109208353]
We extend regularization to tackle two central problems of pruning: pruning schedule and weight importance scoring. Specifically, we propose an L2 regularization variant with rising penalty factors and show it can bring significant accuracy gains. The proposed algorithms are easy to implement and scalable to large datasets and networks in both structured and unstructured pruning.
arXiv Detail & Related papers (2020-12-16T20:16:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.