X-PEFT: eXtremely Parameter-Efficient Fine-Tuning for Extreme
Multi-Profile Scenarios
- URL: http://arxiv.org/abs/2401.16137v1
- Date: Mon, 29 Jan 2024 13:13:32 GMT
- Title: X-PEFT: eXtremely Parameter-Efficient Fine-Tuning for Extreme
Multi-Profile Scenarios
- Authors: Namju Kwak and Taesup Kim
- Abstract summary: adapter tuning provides increased parameter efficiency compared to full-model fine-tuning.
X-PEFT is a novel PEFT method that leverages a multitude of given adapters by fine-tuning an extremely small set of compact tensors for a new profile.
We evaluate the performance of X-PEFT through LaMP and GLUE tasks and demonstrate that it either matches or surpasses the effectiveness of conventional adapter tuning.
- Score: 5.814571836173169
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Parameter-efficient fine-tuning (PEFT) techniques, such as adapter tuning,
aim to fine-tune a pre-trained language model (PLM) using a minimal number of
parameters for a specific task or profile. Although adapter tuning provides
increased parameter efficiency compared to full-model fine-tuning, it
introduces a small set of additional parameters attached to a PLM for each
profile. This can become problematic in practical applications with multiple
profiles, particularly when a significant increase in the number of profiles
linearly boosts the total number of additional parameters. To mitigate this
issue, we introduce X-PEFT, a novel PEFT method that leverages a multitude of
given adapters by fine-tuning an extremely small set of compact tensors for a
new profile, which serve as binary masks to adaptively select the given
adapters. To efficiently validate our proposed method, we implement it using a
large number of trained or untrained (random) adapters. We evaluate the
performance of X-PEFT through LaMP and GLUE tasks and demonstrate that it
either matches or surpasses the effectiveness of conventional adapter tuning,
despite reducing the memory requirements per profile by a factor of 10,000
compared to it.
Related papers
- Hadamard Adapter: An Extreme Parameter-Efficient Adapter Tuning Method for Pre-trained Language Models [108.08773541490191]
Pre-trained Language models (PLMs) have a huge amount of parameters, fine-tuning them is often expensive and time consuming.
It is necessary to adopt a parameter-efficient approach to reduce parameters of PLMs in fine-tuning without compromising their performance in downstream tasks.
In this paper, we design a novel adapter which only acts on self-attention outputs in PLMs.
arXiv Detail & Related papers (2024-07-04T18:21:28Z) - ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections [59.839926875976225]
We propose the ETHER transformation family, which performs Efficient fineTuning via HypErplane Reflections.
In particular, we introduce ETHER and its relaxation ETHER+, which match or outperform existing PEFT methods with significantly fewer parameters.
arXiv Detail & Related papers (2024-05-30T17:26:02Z) - Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation [67.13876021157887]
Dynamic Tuning (DyT) is a novel approach to improve both parameter and inference efficiency for ViT adaptation.
DyT achieves superior performance compared to existing PEFT methods while evoking only 71% of their FLOPs on the VTAB-1K benchmark.
arXiv Detail & Related papers (2024-03-18T14:05:52Z) - LoRETTA: Low-Rank Economic Tensor-Train Adaptation for
Ultra-Low-Parameter Fine-Tuning of Large Language Models [20.5908375260123]
Various parameter-efficient fine-tuning (PEFT) techniques have been proposed to enable computationally efficient fine-tuning while maintaining model performance.
We present LoRETTA, a framework that significantly reduces trainable parameters through tensor-train decomposition.
LoRETTA achieves comparable or better performance than most widely used PEFT methods with up to $100times$ fewer parameters on the LLaMA-2-7B models.
arXiv Detail & Related papers (2024-02-18T01:20:00Z) - Prototype-based HyperAdapter for Sample-Efficient Multi-task Tuning [30.251155072822055]
Prototype-based HyperAdapter (PHA) is a novel framework built on the adapter-tuning and hypernetwork.
It introduces an instance-dense retriever and prototypical hypernetwork to generate conditional modules in a sample-efficient manner.
We show that PHA strikes a better trade-off between trainable parameters, accuracy on stream tasks, and sample efficiency.
arXiv Detail & Related papers (2023-10-18T02:42:17Z) - Parameter-Efficient Fine-Tuning without Introducing New Latency [7.631596468553607]
We introduce a novel adapter technique that directly applies the adapter to pre-trained parameters instead of the hidden representation.
Our proposed method attains a new state-of-the-art outcome in terms of both performance and storage efficiency, storing only 0.03% parameters of full fine-tuning.
arXiv Detail & Related papers (2023-05-26T08:44:42Z) - Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning [91.5113227694443]
We propose a novel visual.
sensuous-aware fine-Tuning (SPT) scheme.
SPT allocates trainable parameters to task-specific important positions.
Experiments on a wide range of downstream recognition tasks show that our SPT is complementary to the existing PEFT methods.
arXiv Detail & Related papers (2023-03-15T12:34:24Z) - AdaMix: Mixture-of-Adapter for Parameter-efficient Tuning of Large
Language Models [119.7093605087114]
Fine-tuning large-scale pre-trained language models to downstream tasks require updating hundreds of millions of parameters.
This not only increases the serving cost to store a large copy of the model weights for every task, but also exhibits instability during few-shot task adaptation.
We introduce a new mechanism to improve adapter capacity without increasing parameters or computational cost by two key techniques.
arXiv Detail & Related papers (2022-05-24T23:41:22Z) - AdapterBias: Parameter-efficient Token-dependent Representation Shift
for Adapters in NLP Tasks [55.705355299065474]
Transformer-based pre-trained models with millions of parameters require large storage.
Recent approaches tackle this shortcoming by training adapters, but these approaches still require a relatively large number of parameters.
In this study, AdapterBias, a surprisingly simple yet effective adapter architecture, is proposed.
arXiv Detail & Related papers (2022-04-30T16:49:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.