Related papers: Layer-wise Importance Matters: Less Memory for Better Performance in Parameter-efficient Fine-tuning of Large Language Models

Layer-wise Importance Matters: Less Memory for Better Performance in Parameter-efficient Fine-tuning of Large Language Models

URL: http://arxiv.org/abs/2410.11772v2
Date: Tue, 05 Nov 2024 05:13:00 GMT
Title: Layer-wise Importance Matters: Less Memory for Better Performance in Parameter-efficient Fine-tuning of Large Language Models
Authors: Kai Yao, Penglei Gao, Lichun Li, Yuan Zhao, Xiaofeng Wang, Wei Wang, Jianke Zhu,
Abstract summary: Importance-aware Sparse Tuning (IST) is a plug-and-play technique compatible with various PEFT methods that operate on a per-layer basis. IST dynamically updates selected layers in PEFT modules, leading to reduced memory demands.
Score: 19.163639128631534
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Parameter-Efficient Fine-Tuning (PEFT) methods have gained significant popularity for adapting pre-trained Large Language Models (LLMs) to downstream tasks, primarily due to their potential to significantly reduce memory and computational overheads. However, a common limitation in most PEFT approaches is their application of a uniform architectural design across all layers. This uniformity involves identical trainable modules and ignores the varying importance of each layer, leading to sub-optimal fine-tuning results. To overcome the above limitation and obtain better performance, we develop a novel approach, Importance-aware Sparse Tuning (IST), to fully utilize the inherent sparsity and select the most important subset of full layers with effective layer-wise importance scoring. The proposed IST is a versatile and plug-and-play technique compatible with various PEFT methods that operate on a per-layer basis. By leveraging the estimated importance scores, IST dynamically updates these selected layers in PEFT modules, leading to reduced memory demands. We further provide theoretical proof of convergence and empirical evidence of superior performance to demonstrate the advantages of IST over uniform updating strategies. Extensive experiments on a range of LLMs, PEFTs, and downstream tasks substantiate the effectiveness of our proposed method, showcasing IST's capacity to enhance existing layer-based PEFT methods. Our code is available at https://github.com/Kaiseem/IST.

Related papers

Optimizing Specific and Shared Parameters for Efficient Parameter Tuning [46.57365875007367]
We propose SaS, a novel PETL method that effectively mitigates distributional shifts during fine-tuning. SaS captures common statistical characteristics across layers using low-rank projections. Experiments on diverse downstream tasks, few-shot settings and domain generalization demonstrate that SaS significantly enhances performance.
arXiv Detail & Related papers (2025-04-04T13:43:54Z)
LESA: Learnable LLM Layer Scaling-Up [57.0510934286449]
Training Large Language Models (LLMs) from scratch requires immense computational resources, making it prohibitively expensive. Model scaling-up offers a promising solution by leveraging the parameters of smaller models to create larger ones. We propose textbfLESA, a novel learnable method for depth scaling-up.
arXiv Detail & Related papers (2025-02-19T14:58:48Z)
Bilevel ZOFO: Bridging Parameter-Efficient and Zeroth-Order Techniques for Efficient LLM Fine-Tuning and Meta-Training [44.48966200270378]
Fine-tuning pre-trained Large Language Models (LLMs) for downstream tasks using First-Order (FO)imats presents significant computational challenges. We propose a bilevel optimization framework that complements ZO methods with PEFT to mitigate sensitivity to hard prompts. Our Bilevel ZOFO method employs a double-loop optimization strategy, where only the gradient of the PEFT model and the forward pass of the base model are required.
arXiv Detail & Related papers (2025-02-05T20:47:44Z)
Refining Salience-Aware Sparse Fine-Tuning Strategies for Language Models [14.68920095399595]
sparsity-based PEFT (SPEFT) introduces trainable sparse adaptations to the weight matrices in the model. We conduct the first systematic evaluation of salience metrics for SPEFT, inspired by zero-cost NAS proxies. Our work challenges the notion that complexity is necessary for effective PEFT.
arXiv Detail & Related papers (2024-12-18T04:14:35Z)
Parameter-Efficient Fine-Tuning of Large Language Models for Unit Test Generation: An Empirical Study [3.5189934649278922]
Large language models (LLMs) like GitHub Copilot struggle with real-world tasks without fine-tuning. This paper investigates full fine-tuning and various PEFT methods, including LoRA, (IA)3, and prompt tuning. Our findings show that PEFT methods can deliver performance comparable to full fine-tuning for unit test generation.
arXiv Detail & Related papers (2024-11-04T09:03:18Z)
Preserving Pre-trained Representation Space: On Effectiveness of Prefix-tuning for Large Multi-modal Models [24.62337386603331]
Large Multi-modal Models (LMMs) are revolutionizing the way machines interact with the world. To adapt LMMs for downstream tasks, parameter-efficient fine-tuning (PEFT) has gained popularity. This paper focuses on the strengths and weaknesses of each tuning strategy, shifting the focus from the efficiency typically associated with these approaches.
arXiv Detail & Related papers (2024-10-29T07:55:50Z)
BIPEFT: Budget-Guided Iterative Search for Parameter Efficient Fine-Tuning of Large Pretrained Language Models [63.52035708182815]
We introduce a novel Budget-guided Iterative search strategy for automatic PEFT (BIPEFT) BIPEFT employs a new iterative search strategy to disentangle the binary module and rank dimension search spaces. Extensive experiments on public benchmarks demonstrate the superior performance of BIPEFT for downstream tasks with a low parameter budget.
arXiv Detail & Related papers (2024-10-04T18:50:46Z)
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios. In the early route, intermediate outputs are consolidated via an anti-redundancy operation. In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z)
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models [90.14693869269519]
MoE LLMs can achieve higher performance with fewer parameters, but it is still hard to deploy them due to their immense parameter sizes. This paper mainly aims to enhance the deployment efficiency of MoE LLMs by introducing plug-and-play expert-level sparsification techniques.
arXiv Detail & Related papers (2024-02-22T18:56:07Z)
Efficiency at Scale: Investigating the Performance of Diminutive Language Models in Clinical Tasks [2.834743715323873]
We present an investigation into the suitability of different PEFT methods to clinical decision-making tasks. Our analysis shows that the performance of most PEFT approaches varies significantly from one task to another. The effectiveness of PEFT methods in the clinical domain is evident, particularly for specialised models which can operate on low-cost, in-house computing infrastructure.
arXiv Detail & Related papers (2024-02-16T11:30:11Z)
PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation [65.268245109828]
We introduce PRILoRA, which linearly allocates a different rank for each layer, in an increasing manner, and performs pruning throughout the training process. We validate the effectiveness of PRILoRA through extensive experiments on eight GLUE benchmarks, setting a new state of the art.
arXiv Detail & Related papers (2024-01-20T20:25:17Z)
UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory [69.33445217944029]
PETL is an effective strategy for adapting pre-trained models to downstream domains. Recent PETL works focus on the more valuable memory-efficient characteristic. We propose a new memory-efficient PETL strategy, Universal Parallel Tuning (UniPT)
arXiv Detail & Related papers (2023-08-28T05:38:43Z)
Empirical Analysis of the Strengths and Weaknesses of PEFT Techniques for LLMs [1.867982979635437]
We provide a benchmark of various PEFT techniques and evaluate model performance across different data scales. Contrary to popular belief, we empirically prove that PEFT techniques converge slower than full tuning in low data scenarios. We further optimize these PEFT techniques by selectively choosing which parts of the model to train, and find that these techniques can be applied with significantly fewer parameters.
arXiv Detail & Related papers (2023-04-28T17:39:49Z)
UniPELT: A Unified Framework for Parameter-Efficient Language Model Tuning [64.638804236566]
We propose a unified framework, UniPELT, which incorporates different PELT methods as submodules and learns to activate the ones that best suit the current data or task setup. Remarkably, on the GLUE benchmark, UniPELT consistently achieves 13pt gains compared to the best individual PELT method that it incorporates and even outperforms fine-tuning under different setups.
arXiv Detail & Related papers (2021-10-14T17:40:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.