SIBO: A Simple Booster for Parameter-Efficient Fine-Tuning
- URL: http://arxiv.org/abs/2402.11896v2
- Date: Sat, 1 Jun 2024 07:13:15 GMT
- Title: SIBO: A Simple Booster for Parameter-Efficient Fine-Tuning
- Authors: Zhihao Wen, Jie Zhang, Yuan Fang,
- Abstract summary: We present SIBO, which is a SImpleoster to enhance PEFT, by injecting an initial residual.
Extensive experiments on 22 benchmark datasets demonstrate that SIBO significantly enhances the performance of various strong baselines, achieving up to 15.7% and 23.5% improvement over existing PEFT methods on the arithmetic and commonsense reasoning tasks, respectively.
- Score: 10.450910399290818
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fine-tuning all parameters of large language models (LLMs) necessitates substantial computational power and extended time. Latest advancements in parameter-efficient fine-tuning (PEFT) techniques, such as Adapter tuning and LoRA, allow for adjustments to only a minor fraction of the parameters of these LLMs. Concurrently, it has been noted that the issue of over-smoothing diminishes the effectiveness of these Transformer-based LLMs, resulting in suboptimal performances in downstream tasks. In this paper, we present SIBO, which is a SImple BOoster to enhance PEFT, by injecting an initial residual. SIBO is straightforward and readily extensible to a range of state-of-the-art PEFT techniques to alleviate over-smoothing and enhance performance. Extensive experiments on 22 benchmark datasets demonstrate that SIBO significantly enhances the performance of various strong baselines, achieving up to 15.7% and 23.5% improvement over existing PEFT methods on the arithmetic and commonsense reasoning tasks, respectively.
Related papers
- Preserving Pre-trained Representation Space: On Effectiveness of Prefix-tuning for Large Multi-modal Models [24.62337386603331]
Large Multi-modal Models (LMMs) are revolutionizing the way machines interact with the world.
To adapt LMMs for downstream tasks, parameter-efficient fine-tuning (PEFT) has gained popularity.
This paper focuses on the strengths and weaknesses of each tuning strategy, shifting the focus from the efficiency typically associated with these approaches.
arXiv Detail & Related papers (2024-10-29T07:55:50Z) - BIPEFT: Budget-Guided Iterative Search for Parameter Efficient Fine-Tuning of Large Pretrained Language Models [63.52035708182815]
We introduce a novel Budget-guided Iterative search strategy for automatic PEFT (BIPEFT)
BIPEFT employs a new iterative search strategy to disentangle the binary module and rank dimension search spaces.
Extensive experiments on public benchmarks demonstrate the superior performance of BIPEFT for downstream tasks with a low parameter budget.
arXiv Detail & Related papers (2024-10-04T18:50:46Z) - Light-PEFT: Lightening Parameter-Efficient Fine-Tuning via Early Pruning [17.032155725171958]
We propose the Light-PEFT framework, which includes two methods: Masked Early Pruning of the Foundation Model and Multi-Granularity Early Pruning of PEFT.
Compared to utilizing the PEFT method directly, Light-PEFT achieves training and inference speedup, reduces memory usage, and maintains comparable performance.
arXiv Detail & Related papers (2024-06-06T07:03:29Z) - ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections [59.839926875976225]
We propose the ETHER transformation family, which performs Efficient fineTuning via HypErplane Reflections.
In particular, we introduce ETHER and its relaxation ETHER+, which match or outperform existing PEFT methods with significantly fewer parameters.
arXiv Detail & Related papers (2024-05-30T17:26:02Z) - Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation [67.13876021157887]
Dynamic Tuning (DyT) is a novel approach to improve both parameter and inference efficiency for ViT adaptation.
DyT achieves superior performance compared to existing PEFT methods while evoking only 71% of their FLOPs on the VTAB-1K benchmark.
arXiv Detail & Related papers (2024-03-18T14:05:52Z) - Advancing Parameter Efficiency in Fine-tuning via Representation Editing [41.81020951061438]
We propose a novel fine-tuning approach for neural models, named Representation EDiting (RED)
RED modifies the representations generated at some layers through the application of scaling and biasing operations.
Remarkably, RED achieves results comparable or superior to both full parameter fine-tuning and other PEFT methods.
arXiv Detail & Related papers (2024-02-23T08:21:02Z) - LoRETTA: Low-Rank Economic Tensor-Train Adaptation for
Ultra-Low-Parameter Fine-Tuning of Large Language Models [20.5908375260123]
Various parameter-efficient fine-tuning (PEFT) techniques have been proposed to enable computationally efficient fine-tuning while maintaining model performance.
We present LoRETTA, a framework that significantly reduces trainable parameters through tensor-train decomposition.
LoRETTA achieves comparable or better performance than most widely used PEFT methods with up to $100times$ fewer parameters on the LLaMA-2-7B models.
arXiv Detail & Related papers (2024-02-18T01:20:00Z) - APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference [63.52244442498831]
Fine-tuning and inference with large Language Models (LMs) are generally known to be expensive.
We introduce APT that adaptively prunes and tunes parameters for the LMs.
We show that APT speeds up LMs fine-tuning by up to 8x and reduces large LMs memory training footprint by up to 70%.
arXiv Detail & Related papers (2024-01-22T18:39:40Z) - Parameter-Efficient Fine-Tuning without Introducing New Latency [7.631596468553607]
We introduce a novel adapter technique that directly applies the adapter to pre-trained parameters instead of the hidden representation.
Our proposed method attains a new state-of-the-art outcome in terms of both performance and storage efficiency, storing only 0.03% parameters of full fine-tuning.
arXiv Detail & Related papers (2023-05-26T08:44:42Z) - Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning [91.5113227694443]
We propose a novel visual.
sensuous-aware fine-Tuning (SPT) scheme.
SPT allocates trainable parameters to task-specific important positions.
Experiments on a wide range of downstream recognition tasks show that our SPT is complementary to the existing PEFT methods.
arXiv Detail & Related papers (2023-03-15T12:34:24Z) - Scaling & Shifting Your Features: A New Baseline for Efficient Model
Tuning [126.84770886628833]
Existing finetuning methods either tune all parameters of the pretrained model (full finetuning) or only tune the last linear layer (linear probing)
We propose a new parameter-efficient finetuning method termed as SSF, representing that researchers only need to Scale and Shift the deep Features extracted by a pre-trained model to catch up with the performance full finetuning.
arXiv Detail & Related papers (2022-10-17T08:14:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.