Related papers: Pruning by Block Benefit: Exploring the Properties of Vision Transformer Blocks during Domain Adaptation

Pruning by Block Benefit: Exploring the Properties of Vision Transformer Blocks during Domain Adaptation

URL: http://arxiv.org/abs/2506.23675v1
Date: Mon, 30 Jun 2025 09:58:25 GMT
Title: Pruning by Block Benefit: Exploring the Properties of Vision Transformer Blocks during Domain Adaptation
Authors: Patrick Glandorf, Bodo Rosenhahn,
Abstract summary: Pruning by Block Benefit (P3B) is a pruning method that utilizes the relative contribution on block level to globally assign parameter resources.<n>P3B is able to conserve high performance, even in high sparsity regimes of 70% parameter reduction while only losing 0.64% in accuracy.
Score: 18.23994867489678
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vision Transformer have set new benchmarks in several tasks, but these models come with the lack of high computational costs which makes them impractical for resource limited hardware. Network pruning reduces the computational complexity by removing less important operations while maintaining performance. However, pruning a model on an unseen data domain, leads to a misevaluation of weight significance, resulting in suboptimal resource assignment. In this work, we find that task-sensitive layers initially fail to improve the feature representation on downstream tasks, leading to performance loss for early pruning decisions. To address this problem, we introduce Pruning by Block Benefit (P3B), a pruning method that utilizes the relative contribution on block level to globally assign parameter resources. P3B identifies low-impact components to reduce parameter allocation while preserving critical ones. Classical pruning mask optimization struggles to reactivate zero-mask-elements. In contrast, P3B sets a layerwise keep ratio based on global performance metrics, ensuring the reactivation of late-converging blocks. We show in extensive experiments that P3B is a state of the art pruning method with most noticeable gains in transfer learning tasks. Notably, P3B is able to conserve high performance, even in high sparsity regimes of 70% parameter reduction while only losing 0.64% in accuracy.

Related papers

Flexible Automatic Identification and Removal (FAIR)-Pruner: An Efficient Neural Network Pruning Method [11.575879702610914]
This paper proposes the Flexible Automatic Identification and Removal (FAIR)-Pruner, a novel method for neural network structured pruning.<n>A major advantage of FAIR-Pruner lies in its capacity to automatically determine the layer-wise pruning rates, which yields a more efficient subnetwork structure.<n>With utilization scores and reconstruction errors, users can flexibly obtain pruned models under different pruning ratios.
arXiv Detail & Related papers (2025-08-04T10:59:07Z)
High-Layer Attention Pruning with Rescaling [14.141903038286362]
Pruning is a highly effective approach for compressing large language models (LLMs)<n>We propose a novel pruning algorithm that strategically prunes attention heads in the model's higher layers.<n>We conduct comprehensive experiments on a wide range of LLMs, including LLaMA3.1-8B, Mistral-7B-v0.3, Qwen2-7B, and Gemma2-9B.
arXiv Detail & Related papers (2025-07-02T17:15:05Z)
FlexiGPT: Pruning and Extending Large Language Models with Low-Rank Weight Sharing [59.12511498024836]
We present a method to prune large language models (LLMs) that selectively prunes model blocks based on an importance score.<n>We propose a principled metric to replace each pruned block using a weight-sharing mechanism.<n> Empirical evaluations demonstrate substantial performance gains over existing methods.
arXiv Detail & Related papers (2025-01-24T18:46:37Z)
OP-LoRA: The Blessing of Dimensionality [93.08208871549557]
Low-rank adapters enable fine-tuning of large models with only a small number of parameters.<n>They often pose optimization challenges, with poor convergence.<n>We introduce an over- parameterized approach that accelerates training without increasing inference costs.<n>We achieve improvements in vision-language tasks and especially notable increases in image generation.
arXiv Detail & Related papers (2024-12-13T18:55:19Z)
CFSP: An Efficient Structured Pruning Framework for LLMs with Coarse-to-Fine Activation Information [33.01180010689081]
We introduce an efficient structured pruning framework named CFSP.<n>We first allocate the sparsity budget across blocks based on their importance and then retain important weights within each block.<n>Results demonstrate that CFSP outperforms existing methods on diverse models across various sparsity budgets.
arXiv Detail & Related papers (2024-09-20T04:03:27Z)
A deeper look at depth pruning of LLMs [49.30061112976263]
Large Language Models (LLMs) are resource-intensive to train but more costly to deploy in production. Recent work has attempted to prune blocks of LLMs based on cheap proxies for estimating block importance. We show that adaptive metrics exhibit a trade-off in performance between tasks.
arXiv Detail & Related papers (2024-07-23T08:40:27Z)
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios. In the early route, intermediate outputs are consolidated via an anti-redundancy operation. In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z)
Bypass Back-propagation: Optimization-based Structural Pruning for Large Language Models via Policy Gradient [57.9629676017527]
We propose an optimization-based structural pruning on Large-Language Models. We learn the pruning masks in a probabilistic space directly by optimizing the loss of the pruned model. Our method operates for 2.7 hours with around 35GB memory for the 13B models on a single A100 GPU.
arXiv Detail & Related papers (2024-06-15T09:31:03Z)
BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation [54.28841287750586]
Large language models (LLMs) have demonstrated outstanding performance in various tasks, such as text summarization, text question-answering, and etc. Existing solutions such as SparseGPT and Wanda attempt to alleviate this issue through weight pruning. This paper introduces a novel LLM pruning technique dubbed blockwise parameter-efficient sparsity allocation (BESA) by applying a blockwise reconstruction loss.
arXiv Detail & Related papers (2024-02-18T12:44:15Z)
Compressing the Backward Pass of Large-Scale Neural Architectures by Structured Activation Pruning [0.0]
Sparsity in Deep Neural Networks (DNNs) has gained attention as a solution. This work focuses on ephemeral sparsity, aiming to reduce memory consumption during training. We report the effectiveness of activation pruning by evaluating training speed, accuracy, and memory usage of large-scale neural architectures.
arXiv Detail & Related papers (2023-11-28T15:31:31Z)
Rapid Structural Pruning of Neural Networks with Set-based Task-Adaptive Meta-Pruning [83.59005356327103]
A common limitation of most existing pruning techniques is that they require pre-training of the network at least once before pruning. We propose STAMP, which task-adaptively prunes a network pretrained on a large reference dataset by generating a pruning mask on it as a function of the target dataset. We validate STAMP against recent advanced pruning methods on benchmark datasets.
arXiv Detail & Related papers (2020-06-22T10:57:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.