Related papers: PrunePEFT: Iterative Hybrid Pruning for Parameter-Efficient Fine-tuning of LLMs

PrunePEFT: Iterative Hybrid Pruning for Parameter-Efficient Fine-tuning of LLMs

URL: http://arxiv.org/abs/2506.07587v1
Date: Mon, 09 Jun 2025 09:32:58 GMT
Title: PrunePEFT: Iterative Hybrid Pruning for Parameter-Efficient Fine-tuning of LLMs
Authors: Tongzhou Yu, Zhuhao Zhang, Guanghui Zhu, Shen Jiang, Meikang Qiu, Yihua Huang,
Abstract summary: Efficient Fine-Tuning (PEFT) methods have emerged as effective and promising approaches for fine-tuning pre-trained language models.<n>In this paper, we propose a novel approach, PrunePEFT, which formulates the PEFT strategy search as a pruning problem.
Score: 8.52711842775914
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Parameter Efficient Fine-Tuning (PEFT) methods have emerged as effective and promising approaches for fine-tuning pre-trained language models. Compared with Full parameter Fine-Tuning (FFT), PEFT achieved comparable task performance with a substantial reduction of trainable parameters, which largely saved the training and storage costs. However, using the PEFT method requires considering a vast design space, such as the type of PEFT modules and their insertion layers. Inadequate configurations can lead to sub-optimal results. Conventional solutions such as architectural search techniques, while effective, tend to introduce substantial additional overhead. In this paper, we propose a novel approach, PrunePEFT, which formulates the PEFT strategy search as a pruning problem and introduces a hybrid pruning strategy that capitalizes on the sensitivity of pruning methods to different PEFT modules. This method extends traditional pruning techniques by iteratively removing redundant or conflicting PEFT modules, thereby optimizing the fine-tuned configuration. By efficiently identifying the most relevant modules, our approach significantly reduces the computational burden typically associated with architectural search processes, making it a more scalable and efficient solution for fine-tuning large pre-trained models.

Related papers

ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts.<n>Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z)
Layer-wise Importance Matters: Less Memory for Better Performance in Parameter-efficient Fine-tuning of Large Language Models [19.163639128631534]
Importance-aware Sparse Tuning (IST) is a plug-and-play technique compatible with various PEFT methods that operate on a per-layer basis. IST dynamically updates selected layers in PEFT modules, leading to reduced memory demands.
arXiv Detail & Related papers (2024-10-15T16:53:26Z)
BIPEFT: Budget-Guided Iterative Search for Parameter Efficient Fine-Tuning of Large Pretrained Language Models [63.52035708182815]
We introduce a novel Budget-guided Iterative search strategy for automatic PEFT (BIPEFT) BIPEFT employs a new iterative search strategy to disentangle the binary module and rank dimension search spaces. Extensive experiments on public benchmarks demonstrate the superior performance of BIPEFT for downstream tasks with a low parameter budget.
arXiv Detail & Related papers (2024-10-04T18:50:46Z)
See Further for Parameter Efficient Fine-tuning by Standing on the Shoulders of Decomposition [56.87609859444084]
parameter-efficient fine-tuning (PEFT) focuses on optimizing a select subset of parameters while keeping the rest fixed, significantly lowering computational and storage overheads.<n>We take the first step to unify all approaches by dissecting them from a decomposition perspective.<n>We introduce two novel PEFT methods alongside a simple yet effective framework designed to enhance the performance of PEFT techniques across various applications.
arXiv Detail & Related papers (2024-07-07T15:44:42Z)
Light-PEFT: Lightening Parameter-Efficient Fine-Tuning via Early Pruning [17.032155725171958]
We propose the Light-PEFT framework, which includes two methods: Masked Early Pruning of the Foundation Model and Multi-Granularity Early Pruning of PEFT. Compared to utilizing the PEFT method directly, Light-PEFT achieves training and inference speedup, reduces memory usage, and maintains comparable performance.
arXiv Detail & Related papers (2024-06-06T07:03:29Z)
ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections [59.839926875976225]
We propose the ETHER transformation family, which performs Efficient fineTuning via HypErplane Reflections. In particular, we introduce ETHER and its relaxation ETHER+, which match or outperform existing PEFT methods with significantly fewer parameters.
arXiv Detail & Related papers (2024-05-30T17:26:02Z)
LoRETTA: Low-Rank Economic Tensor-Train Adaptation for Ultra-Low-Parameter Fine-Tuning of Large Language Models [20.5908375260123]
Various parameter-efficient fine-tuning (PEFT) techniques have been proposed to enable computationally efficient fine-tuning while maintaining model performance. We present LoRETTA, a framework that significantly reduces trainable parameters through tensor-train decomposition. LoRETTA achieves comparable or better performance than most widely used PEFT methods with up to $100times$ fewer parameters on the LLaMA-2-7B models.
arXiv Detail & Related papers (2024-02-18T01:20:00Z)
ComPEFT: Compression for Communicating Parameter Efficient Updates via Sparsification and Quantization [100.90624220423634]
We present ComPEFT, a novel method for compressing fine-tuning residuals (task vectors) of PEFT based models. In extensive evaluation across T5, T0, and LLaMA-based models with 200M - 65B parameters, ComPEFT achieves compression ratios of 8x - 50x.
arXiv Detail & Related papers (2023-11-22T05:28:59Z)
LoRAPrune: Structured Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning [56.88751562302793]
Low-rank adaption (LoRA) has emerged to fine-tune large language models (LLMs) LoRAPrune is a new framework that delivers an accurate structured pruned model in a highly memory-efficient manner. LoRAPrune achieves a reduction in perplexity by 4.81 on WikiText2 and 3.46 on PTB, while also decreasing memory usage by 52.6%.
arXiv Detail & Related papers (2023-05-28T15:15:48Z)
AutoPEFT: Automatic Configuration Search for Parameter-Efficient Fine-Tuning [77.61565726647784]
Motivated by advances in neural architecture search, we propose AutoPEFT for automatic PEFT configuration selection. We show that AutoPEFT-discovered configurations significantly outperform existing PEFT methods and are on par or better than FFT without incurring substantial training efficiency costs.
arXiv Detail & Related papers (2023-01-28T08:51:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.