Layer-wise Model Pruning based on Mutual Information
- URL: http://arxiv.org/abs/2108.12594v1
- Date: Sat, 28 Aug 2021 07:51:47 GMT
- Title: Layer-wise Model Pruning based on Mutual Information
- Authors: Chun Fan, Jiwei Li, Xiang Ao, Fei Wu, Yuxian Meng, Xiaofei Sun
- Abstract summary: The proposed strategy avoids irregular memory access since representations and matrices can be squeezed into their smaller but dense counterparts, leading to greater speedup.
The proposed method operates from a more global perspective based on training signals in the top layer, and prunes each layer by propagating the effect of global signals through layers, leading to better performances at the same sparsity level.
- Score: 27.583869809219244
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The proposed pruning strategy offers merits over weight-based pruning
techniques: (1) it avoids irregular memory access since representations and
matrices can be squeezed into their smaller but dense counterparts, leading to
greater speedup; (2) in a manner of top-down pruning, the proposed method
operates from a more global perspective based on training signals in the top
layer, and prunes each layer by propagating the effect of global signals
through layers, leading to better performances at the same sparsity level.
Extensive experiments show that at the same sparsity level, the proposed
strategy offers both greater speedup and higher performances than weight-based
pruning methods (e.g., magnitude pruning, movement pruning).
Related papers
- High-Layer Attention Pruning with Rescaling [14.141903038286362]
Pruning is a highly effective approach for compressing large language models (LLMs)<n>We propose a novel pruning algorithm that strategically prunes attention heads in the model's higher layers.<n>We conduct comprehensive experiments on a wide range of LLMs, including LLaMA3.1-8B, Mistral-7B-v0.3, Qwen2-7B, and Gemma2-9B.
arXiv Detail & Related papers (2025-07-02T17:15:05Z) - A Sliding Layer Merging Method for Efficient Depth-Wise Pruning in LLMs [14.514670828712669]
This paper reveals the "Patch-like" feature relationship between layers in large language models by analyzing the correlation of the outputs of different layers in the reproducing kernel Hilbert space.
We propose a sliding layer merging method that dynamically selects and fuses consecutive layers from top to bottom according to a pre-defined similarity threshold.
Our method outperforms existing pruning techniques in both zero-shot inference performance and retraining recovery quality after pruning.
arXiv Detail & Related papers (2025-02-26T14:15:24Z) - Shaping Sparse Rewards in Reinforcement Learning: A Semi-supervised Approach [7.200081267352692]
Experimental results in Atari and robotic manipulation demonstrate that our method outperforms supervised-based approaches in reward inference.<n>In more sparse-reward environments, our method achieves up to twice the peak scores compared to supervised baselines.
arXiv Detail & Related papers (2025-01-31T13:35:19Z) - LayerMatch: Do Pseudo-labels Benefit All Layers? [77.59625180366115]
Semi-supervised learning offers a promising solution to mitigate the dependency of labeled data.
We develop two layer-specific pseudo-label strategies, termed Grad-ReLU and Avg-Clustering.
Our approach consistently demonstrates exceptional performance on standard semi-supervised learning benchmarks.
arXiv Detail & Related papers (2024-06-20T11:25:50Z) - Accelerating Inference in Large Language Models with a Unified Layer Skipping Strategy [67.45518210171024]
Dynamic computation methods have shown notable acceleration for Large Language Models (LLMs) by skipping several layers of computations.
We propose a Unified Layer Skipping strategy, which selects the number of layers to skip computation based solely on the target speedup ratio.
Experimental results on two common tasks, i.e., machine translation and text summarization, indicate that given a target speedup ratio, the Unified Layer Skipping strategy significantly enhances both the inference performance and the actual model throughput.
arXiv Detail & Related papers (2024-04-10T12:12:07Z) - The Unreasonable Ineffectiveness of the Deeper Layers [5.984361440126354]
We study a simple layer-pruning strategy for popular families of open-weight pretrained LLMs.
We find minimal degradation of performance until after a large fraction of the layers are removed.
From a scientific perspective, the robustness of these LLMs to the deletion of layers implies either that current pretraining methods are not properly leveraging the parameters in the deeper layers of the network or that the shallow layers play a critical role in storing knowledge.
arXiv Detail & Related papers (2024-03-26T17:20:04Z) - Improving Adversarial Transferability via Intermediate-level
Perturbation Decay [79.07074710460012]
We develop a novel intermediate-level method that crafts adversarial examples within a single stage of optimization.
Experimental results show that it outperforms state-of-the-arts by large margins in attacking various victim models.
arXiv Detail & Related papers (2023-04-26T09:49:55Z) - Boosting Adversarial Transferability through Enhanced Momentum [50.248076722464184]
Deep learning models are vulnerable to adversarial examples crafted by adding human-imperceptible perturbations on benign images.
Various momentum iterative gradient-based methods are shown to be effective to improve the adversarial transferability.
We propose an enhanced momentum iterative gradient-based method to further enhance the adversarial transferability.
arXiv Detail & Related papers (2021-03-19T03:10:32Z) - Sample Efficient Reinforcement Learning with REINFORCE [10.884278019498588]
We consider classical policy gradient methods and the widely-used REINFORCE estimation procedure.
By controlling number of "bad" episodes, we establish an anytime sub-linear high regret bound as well as almost sure global convergence of the average regret with anally sub-linear rate.
These provide the first set of global convergence and sample efficiency results for the well-known REINFORCE algorithm and contribute to a better understanding of its performance in practice.
arXiv Detail & Related papers (2020-10-22T01:02:55Z) - Layer-adaptive sparsity for the Magnitude-based Pruning [88.37510230946478]
We propose a novel importance score for global pruning, coined layer-adaptive magnitude-based pruning (LAMP) score.
LAMP consistently outperforms popular existing schemes for layerwise sparsity selection.
arXiv Detail & Related papers (2020-10-15T09:14:02Z) - Lookahead: A Far-Sighted Alternative of Magnitude-based Pruning [83.99191569112682]
Magnitude-based pruning is one of the simplest methods for pruning neural networks.
We develop a simple pruning method, coined lookahead pruning, by extending the single layer optimization to a multi-layer optimization.
Our experimental results demonstrate that the proposed method consistently outperforms magnitude-based pruning on various networks.
arXiv Detail & Related papers (2020-02-12T05:38:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.