Related papers: Layer Pruning with Consensus: A Triple-Win Solution

Layer Pruning with Consensus: A Triple-Win Solution

URL: http://arxiv.org/abs/2411.14345v1
Date: Thu, 21 Nov 2024 17:41:27 GMT
Title: Layer Pruning with Consensus: A Triple-Win Solution
Authors: Leandro Giusti Mugnaini, Carolina Tavares Duarte, Anna H. Reali Costa, Artur Jordao,
Abstract summary: Layer-pruning approaches often rely on single criteria that may not fully capture the complex, underlying properties of layers. We propose a novel approach that combines multiple similarity metrics into a single expressive measure of low-importance layers, called the Consensus criterion. Our technique delivers a triple-win solution: low accuracy drop, high-performance improvement, and increased robustness to adversarial attacks.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Layer pruning offers a promising alternative to standard structured pruning, effectively reducing computational costs, latency, and memory footprint. While notable layer-pruning approaches aim to detect unimportant layers for removal, they often rely on single criteria that may not fully capture the complex, underlying properties of layers. We propose a novel approach that combines multiple similarity metrics into a single expressive measure of low-importance layers, called the Consensus criterion. Our technique delivers a triple-win solution: low accuracy drop, high-performance improvement, and increased robustness to adversarial attacks. With up to 78.80% FLOPs reduction and performance on par with state-of-the-art methods across different benchmarks, our approach reduces energy consumption and carbon emissions by up to 66.99% and 68.75%, respectively. Additionally, it avoids shortcut learning and improves robustness by up to 4 percentage points under various adversarial attacks. Overall, the Consensus criterion demonstrates its effectiveness in creating robust, efficient, and environmentally friendly pruned models.

Related papers

Pruning Everything, Everywhere, All at Once [1.7811840395202343]
Pruning structures in deep learning models efficiently reduces model complexity and improves computational efficiency.<n>We propose a new method capable of pruning different structures within a model as follows.<n>Iteratively repeating this process provides highly sparse models that preserve the original predictive ability.
arXiv Detail & Related papers (2025-06-04T23:34:28Z)
Making Every Step Effective: Jailbreaking Large Vision-Language Models Through Hierarchical KV Equalization [74.78433600288776]
HKVE (Hierarchical Key-Value Equalization) is an innovative jailbreaking framework that selectively accepts gradient optimization results. We show that HKVE substantially outperforms existing methods by substantially outperforming existing methods by margins of 20.43%, 21.01% and 26.43% respectively.
arXiv Detail & Related papers (2025-03-14T17:57:42Z)
A Sliding Layer Merging Method for Efficient Depth-Wise Pruning in LLMs [14.514670828712669]
This paper reveals the "Patch-like" feature relationship between layers in large language models by analyzing the correlation of the outputs of different layers in the reproducing kernel Hilbert space. We propose a sliding layer merging method that dynamically selects and fuses consecutive layers from top to bottom according to a pre-defined similarity threshold. Our method outperforms existing pruning techniques in both zero-shot inference performance and retraining recovery quality after pruning.
arXiv Detail & Related papers (2025-02-26T14:15:24Z)
Determining Layer-wise Sparsity for Large Language Models Through a Theoretical Perspective [55.90119819642064]
We address the challenge of determining the layer-wise sparsity rates of large language models (LLMs) through a theoretical perspective. This refers to the cumulative effect of reconstruction errors throughout the sparsification process. We derive a simple yet effective approach to layer-wise sparsity allocation that mitigates this issue.
arXiv Detail & Related papers (2025-02-20T17:51:10Z)
GRASP: Replace Redundant Layers with Adaptive Singular Parameters for Efficient Model Compression [26.51079570548107]
We propose GRASP (Gradient-based Retention of Adaptive Singular Parameters), a novel compression framework.<n>By replacing redundant layers with only a minimal set of parameters, GRASP achieves efficient compression while maintaining strong performance with minimal overhead.
arXiv Detail & Related papers (2024-12-31T08:22:21Z)
Anti-Collapse Loss for Deep Metric Learning Based on Coding Rate Metric [99.19559537966538]
DML aims to learn a discriminative high-dimensional embedding space for downstream tasks like classification, clustering, and retrieval. To maintain the structure of embedding space and avoid feature collapse, we propose a novel loss function called Anti-Collapse Loss. Comprehensive experiments on benchmark datasets demonstrate that our proposed method outperforms existing state-of-the-art methods.
arXiv Detail & Related papers (2024-07-03T13:44:20Z)
LayerMatch: Do Pseudo-labels Benefit All Layers? [77.59625180366115]
Semi-supervised learning offers a promising solution to mitigate the dependency of labeled data. We develop two layer-specific pseudo-label strategies, termed Grad-ReLU and Avg-Clustering. Our approach consistently demonstrates exceptional performance on standard semi-supervised learning benchmarks.
arXiv Detail & Related papers (2024-06-20T11:25:50Z)
Effective Layer Pruning Through Similarity Metric Perspective [0.0]
Deep neural networks have been the predominant paradigm in machine learning for solving cognitive tasks. Pruning structures from these models is a straightforward approach to reducing network complexity. Layer pruning often hurts the network predictive ability (i.e., accuracy) at high compression rates. This work introduces an effective layer-pruning strategy that meets all underlying properties pursued by pruning methods.
arXiv Detail & Related papers (2024-05-27T11:54:51Z)
Accelerating Inference in Large Language Models with a Unified Layer Skipping Strategy [67.45518210171024]
Dynamic computation methods have shown notable acceleration for Large Language Models (LLMs) by skipping several layers of computations. We propose a Unified Layer Skipping strategy, which selects the number of layers to skip computation based solely on the target speedup ratio. Experimental results on two common tasks, i.e., machine translation and text summarization, indicate that given a target speedup ratio, the Unified Layer Skipping strategy significantly enhances both the inference performance and the actual model throughput.
arXiv Detail & Related papers (2024-04-10T12:12:07Z)
The Unreasonable Ineffectiveness of the Deeper Layers [5.984361440126354]
We study a simple layer-pruning strategy for popular families of open-weight pretrained LLMs. We find minimal degradation of performance until after a large fraction of the layers are removed. From a scientific perspective, the robustness of these LLMs to the deletion of layers implies either that current pretraining methods are not properly leveraging the parameters in the deeper layers of the network or that the shallow layers play a critical role in storing knowledge.
arXiv Detail & Related papers (2024-03-26T17:20:04Z)
Enhanced Sparsification via Stimulative Training [36.0559905521154]
Existing methods commonly set sparsity-inducing penalty terms to suppress the importance of dropped weights. We propose a structured pruning framework, named expressivity, based on an enhanced sparsification paradigm. To reduce the huge capacity gap of distillation, we propose a mutating expansion technique.
arXiv Detail & Related papers (2024-03-11T04:05:17Z)
Select High-Level Features: Efficient Experts from a Hierarchical Classification Network [4.051316555028782]
This study introduces a novel expert generation method that dynamically reduces task and computational complexity without compromising predictive performance. It is based on a new hierarchical classification network topology that combines sequential processing of generic low-level features with parallelism and nesting of high-level features. In terms of dynamic inference our methodology can achieve an exclusion of up to 88.7,% of parameters and 73.4,% fewer giga-multiply accumulate (GMAC) operations.
arXiv Detail & Related papers (2024-03-08T00:02:42Z)
ERNIE-SPARSE: Learning Hierarchical Efficient Transformer Through Regularized Self-Attention [48.697458429460184]
Two factors, information bottleneck sensitivity and inconsistency between different attention topologies, could affect the performance of the Sparse Transformer. This paper proposes a well-designed model named ERNIE-Sparse. It consists of two distinctive parts: (i) Hierarchical Sparse Transformer (HST) to sequentially unify local and global information, and (ii) Self-Attention Regularization (SAR) to minimize the distance for transformers with different attention topologies.
arXiv Detail & Related papers (2022-03-23T08:47:01Z)
Boosting Transferability of Targeted Adversarial Examples via Hierarchical Generative Networks [56.96241557830253]
Transfer-based adversarial attacks can effectively evaluate model robustness in the black-box setting. We propose a conditional generative attacking model, which can generate the adversarial examples targeted at different classes. Our method improves the success rates of targeted black-box attacks by a significant margin over the existing methods.
arXiv Detail & Related papers (2021-07-05T06:17:47Z)
Manifold Regularized Dynamic Network Pruning [102.24146031250034]
This paper proposes a new paradigm that dynamically removes redundant filters by embedding the manifold information of all instances into the space of pruned networks. The effectiveness of the proposed method is verified on several benchmarks, which shows better performance in terms of both accuracy and computational cost.
arXiv Detail & Related papers (2021-03-10T03:59:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.