Effective Layer Pruning Through Similarity Metric Perspective
- URL: http://arxiv.org/abs/2405.17081v1
- Date: Mon, 27 May 2024 11:54:51 GMT
- Title: Effective Layer Pruning Through Similarity Metric Perspective
- Authors: Ian Pons, Bruno Yamamoto, Anna H. Reali Costa, Artur Jordao,
- Abstract summary: Deep neural networks have been the predominant paradigm in machine learning for solving cognitive tasks.
Pruning structures from these models is a straightforward approach to reducing network complexity.
Layer pruning often hurts the network predictive ability (i.e., accuracy) at high compression rates.
This work introduces an effective layer-pruning strategy that meets all underlying properties pursued by pruning methods.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep neural networks have been the predominant paradigm in machine learning for solving cognitive tasks. Such models, however, are restricted by a high computational overhead, limiting their applicability and hindering advancements in the field. Extensive research demonstrated that pruning structures from these models is a straightforward approach to reducing network complexity. In this direction, most efforts focus on removing weights or filters. Studies have also been devoted to layer pruning as it promotes superior computational gains. However, layer pruning often hurts the network predictive ability (i.e., accuracy) at high compression rates. This work introduces an effective layer-pruning strategy that meets all underlying properties pursued by pruning methods. Our method estimates the relative importance of a layer using the Centered Kernel Alignment (CKA) metric, employed to measure the similarity between the representations of the unpruned model and a candidate layer for pruning. We confirm the effectiveness of our method on standard architectures and benchmarks, in which it outperforms existing layer-pruning strategies and other state-of-the-art pruning techniques. Particularly, we remove more than 75% of computation while improving predictive ability. At higher compression regimes, our method exhibits negligible accuracy drop, while other methods notably deteriorate model accuracy. Apart from these benefits, our pruned models exhibit robustness to adversarial and out-of-distribution samples.
Related papers
- PUMA: margin-based data pruning [51.12154122266251]
We focus on data pruning, where some training samples are removed based on the distance to the model classification boundary (i.e., margin)
We propose PUMA, a new data pruning strategy that computes the margin using DeepFool.
We show that PUMA can be used on top of the current state-of-the-art methodology in robustness, and it is able to significantly improve the model performance unlike the existing data pruning strategies.
arXiv Detail & Related papers (2024-05-10T08:02:20Z) - The Unreasonable Ineffectiveness of the Deeper Layers [5.984361440126354]
We study a simple layer-pruning strategy for popular families of open-weight pretrained LLMs.
We find minimal degradation of performance until after a large fraction of the layers are removed.
From a scientific perspective, the robustness of these LLMs to the deletion of layers implies either that current pretraining methods are not properly leveraging the parameters in the deeper layers of the network or that the shallow layers play a critical role in storing knowledge.
arXiv Detail & Related papers (2024-03-26T17:20:04Z) - Layer-wise Linear Mode Connectivity [52.6945036534469]
Averaging neural network parameters is an intuitive method for the knowledge of two independent models.
It is most prominently used in federated learning.
We analyse the performance of the models that result from averaging single, or groups.
arXiv Detail & Related papers (2023-07-13T09:39:10Z) - Gradient-based Intra-attention Pruning on Pre-trained Language Models [21.444503777215637]
We propose a structured pruning method GRAIN (Gradient-based Intra-attention pruning)
GRAIN inspects and prunes intra-attention structures, which greatly expands the structure search space and enables more flexible models.
Experiments on GLUE, SQuAD, and CoNLL 2003 show that GRAIN notably outperforms other methods, especially in the high sparsity regime.
arXiv Detail & Related papers (2022-12-15T06:52:31Z) - Towards Practical Control of Singular Values of Convolutional Layers [65.25070864775793]
Convolutional neural networks (CNNs) are easy to train, but their essential properties, such as generalization error and adversarial robustness, are hard to control.
Recent research demonstrated that singular values of convolutional layers significantly affect such elusive properties.
We offer a principled approach to alleviating constraints of the prior art at the expense of an insignificant reduction in layer expressivity.
arXiv Detail & Related papers (2022-11-24T19:09:44Z) - Automatic Block-wise Pruning with Auxiliary Gating Structures for Deep
Convolutional Neural Networks [9.293334856614628]
This paper presents a novel structured network pruning method with auxiliary gating structures.
Our experiments demonstrate that our method can achieve state-of-the-arts compression performance for the classification tasks.
arXiv Detail & Related papers (2022-05-07T09:03:32Z) - A Novel Architecture Slimming Method for Network Pruning and Knowledge
Distillation [30.39128740788747]
We propose an architecture slimming method that automates the layer configuration process.
We show that our method shows significant performance gain over baselines after pruning and distillation.
Surprisingly, we find that the resulting layer-wise compression rates correspond to the layer sensitivities found by existing works.
arXiv Detail & Related papers (2022-02-21T12:45:51Z) - Powerpropagation: A sparsity inducing weight reparameterisation [65.85142037667065]
We introduce Powerpropagation, a new weight- parameterisation for neural networks that leads to inherently sparse models.
Models trained in this manner exhibit similar performance, but have a distribution with markedly higher density at zero, allowing more parameters to be pruned safely.
Here, we combine Powerpropagation with a traditional weight-pruning technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing superior performance on the ImageNet benchmark.
arXiv Detail & Related papers (2021-10-01T10:03:57Z) - Effective Model Sparsification by Scheduled Grow-and-Prune Methods [73.03533268740605]
We propose a novel scheduled grow-and-prune (GaP) methodology without pre-training the dense models.
Experiments have shown that such models can match or beat the quality of highly optimized dense models at 80% sparsity on a variety of tasks.
arXiv Detail & Related papers (2021-06-18T01:03:13Z) - Manifold Regularized Dynamic Network Pruning [102.24146031250034]
This paper proposes a new paradigm that dynamically removes redundant filters by embedding the manifold information of all instances into the space of pruned networks.
The effectiveness of the proposed method is verified on several benchmarks, which shows better performance in terms of both accuracy and computational cost.
arXiv Detail & Related papers (2021-03-10T03:59:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.