Does Combining Parameter-efficient Modules Improve Few-shot Transfer
Accuracy?
- URL: http://arxiv.org/abs/2402.15414v1
- Date: Fri, 23 Feb 2024 16:20:29 GMT
- Title: Does Combining Parameter-efficient Modules Improve Few-shot Transfer
Accuracy?
- Authors: Nader Asadi, Mahdi Beitollahi, Yasser Khalil, Yinchuan Li, Guojun
Zhang, Xi Chen
- Abstract summary: In this paper, we explore the composability of LoRA modules, examining if combining pre-trained modules enhances generalization to unseen downstream tasks.
Our experimental results on both vision and language models reveal that in few-shot settings, where only a limited number of samples are available for the downstream task, both uniform and learned composition methods result in better transfer accuracy.
Our research unveils the potential of uniform composition for enhancing transferability in low-shot settings, without introducing additional learnable parameters.
- Score: 19.716749548892214
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Parameter-efficient fine-tuning stands as the standard for efficiently
fine-tuning large language and vision models on downstream tasks. Specifically,
the efficiency of low-rank adaptation has facilitated the creation and sharing
of hundreds of custom LoRA modules, each trained on distinct data from various
downstream tasks. In this paper, we explore the composability of LoRA modules,
examining if combining these pre-trained modules enhances generalization to
unseen downstream tasks. Our investigation involves evaluating two approaches:
(a) uniform composition, involving averaging upstream LoRA modules with equal
weights, and (b) learned composition, where we learn the weights for each
upstream module and perform weighted averaging. Our experimental results on
both vision and language models reveal that in few-shot settings, where only a
limited number of samples are available for the downstream task, both uniform
and learned composition methods result in better transfer accuracy;
outperforming full fine-tuning and training a LoRA from scratch. Moreover, in
full-shot settings, learned composition performs comparably to regular LoRA
training with significantly fewer number of trainable parameters. Our research
unveils the potential of uniform composition for enhancing transferability in
low-shot settings, without introducing additional learnable parameters.
Related papers
- DiffoRA: Enabling Parameter-Efficient LLM Fine-Tuning via Differential Low-Rank Matrix Adaptation [32.369133126167085]
We propose a new PEFT scheme called DiffoRA, which is theoretically grounded and enables module-wise adoption of LoRA.
At the core of our DiffoRA lies a Differential Adaptation Matrix (DAM) to determine which module is the most suitable and essential for fine-tuning.
Our approach achieves the best model accuracy over all the state-of-the-art baselines across various benchmarks.
arXiv Detail & Related papers (2025-02-13T02:41:34Z) - Unlocking Tuning-Free Few-Shot Adaptability in Visual Foundation Models by Recycling Pre-Tuned LoRAs [76.40876036912537]
Large Language Models (LLMs) demonstrate strong few-shot adaptability without requiring fine-tuning.
Current Visual Foundation Models (VFMs) require explicit fine-tuning with sufficient tuning data.
We propose a framework, LoRA Recycle, that distills a meta-LoRA from diverse pre-tuned LoRAs with a meta-learning objective.
arXiv Detail & Related papers (2024-12-03T07:25:30Z) - LoRA vs Full Fine-tuning: An Illusion of Equivalence [76.11938177294178]
We study how different fine-tuning methods change pre-trained models by analyzing the model's weight matrices through the lens of their spectral properties.
We find that full fine-tuning and LoRA yield weight matrices whose singular value decompositions exhibit very different structure.
We conclude by examining why intruder dimensions appear in LoRA fine-tuned models, why they are undesirable, and how their effects can be minimized.
arXiv Detail & Related papers (2024-10-28T17:14:01Z) - Less is More: Extreme Gradient Boost Rank-1 Adaption for Efficient Finetuning of LLMs [75.11449420928139]
Fine-tuning Large Language Models (LLMs) has become a crucial technique for adapting pre-trained models to downstream tasks.
Low-Rank Adaptation (LoRA) has emerged as a promising solution, but there exists a gap between the practical performance of low-rank adaptations and its theoretical optimum.
We propose eXtreme Gradient Boosting LoRA, a novel framework that bridges this gap by leveraging the power of ensemble learning.
arXiv Detail & Related papers (2024-10-25T17:07:13Z) - MTL-LoRA: Low-Rank Adaptation for Multi-Task Learning [74.43869839954168]
We propose MTL-LoRA, which retains the advantages of low-rank adaptation while significantly enhancing multi-task learning capabilities.
MTL-LoRA augments LoRA by incorporating additional task-adaptive parameters that differentiate task-specific information.
This approach enables large language models (LLMs) pre-trained on general corpus to adapt to different target task domains with a limited number of trainable parameters.
arXiv Detail & Related papers (2024-10-12T08:32:26Z) - MELoRA: Mini-Ensemble Low-Rank Adapters for Parameter-Efficient Fine-Tuning [71.50432879573614]
Low-rank adaptation (LoRA) is based on the idea that the adaptation process is intrinsically low-dimensional.
We present MELoRA, a mini-ensemble low-rank adapters that uses fewer trainable parameters while maintaining a higher rank.
Our experimental results show that, compared to LoRA, MELoRA achieves better performance with 8 times fewer trainable parameters on natural language understanding tasks and 36 times fewer trainable parameters on instruction following tasks.
arXiv Detail & Related papers (2024-02-27T07:14:12Z) - PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation [65.268245109828]
We introduce PRILoRA, which linearly allocates a different rank for each layer, in an increasing manner, and performs pruning throughout the training process.
We validate the effectiveness of PRILoRA through extensive experiments on eight GLUE benchmarks, setting a new state of the art.
arXiv Detail & Related papers (2024-01-20T20:25:17Z) - One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning [34.109808214968176]
Generalized LoRA (GLoRA) is an advanced approach for universal parameter-efficient fine-tuning tasks.
It employs a generalized prompt module to optimize pre-trained model weights and adjust intermediate activations.
GLoRA exhibits strong transfer learning, few-shot learning and domain generalization abilities.
arXiv Detail & Related papers (2023-06-13T17:59:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.