Related papers: $\textit{Trans-LoRA}$: towards data-free Transferable Parameter Efficient Finetuning

$\textit{Trans-LoRA}$: towards data-free Transferable Parameter Efficient Finetuning

URL: http://arxiv.org/abs/2405.17258v1
Date: Mon, 27 May 2024 15:15:08 GMT
Title: $\textit{Trans-LoRA}$: towards data-free Transferable Parameter Efficient Finetuning
Authors: Runqian Wang, Soumya Ghosh, David Cox, Diego Antognini, Aude Oliva, Rogerio Feris, Leonid Karlinsky,
Abstract summary: Low-rank adapters (LoRA) and their variants are popular parameter-efficient fine-tuning techniques. When the base model needs to be replaced with a new one, all the associated LoRA modules need to be re-trained. This is especially problematic for commercial cloud applications where the LoRA modules and the base models are hosted by service providers.
Score: 28.668326340001695
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Low-rank adapters (LoRA) and their variants are popular parameter-efficient fine-tuning (PEFT) techniques that closely match full model fine-tune performance while requiring only a small number of additional parameters. These additional LoRA parameters are specific to the base model being adapted. When the base model needs to be deprecated and replaced with a new one, all the associated LoRA modules need to be re-trained. Such re-training requires access to the data used to train the LoRA for the original base model. This is especially problematic for commercial cloud applications where the LoRA modules and the base models are hosted by service providers who may not be allowed to host proprietary client task data. To address this challenge, we propose $\textit{Trans-LoRA}$ -- a novel method for lossless, nearly data-free transfer of LoRAs across base models. Our approach relies on synthetic data to transfer LoRA modules. Using large language models, we design a synthetic data generator to approximate the data-generating process of the $\textit{observed}$ task data subset. Training on the resulting synthetic dataset transfers LoRA modules to new models. We show the effectiveness of our approach using both LLama and Gemma model families. Our approach achieves lossless (mostly improved) LoRA transfer between models within and across different base model families, and even between different PEFT methods, on a wide variety of tasks.

Related papers

In-Context Meta LoRA Generation [61.690065588534296]
Low-rank Adaptation (LoRA) has demonstrated remarkable capabilities for task specific fine-tuning. We propose In-Context Meta LoRA (ICM-LoRA), a novel approach that efficiently achieves task-specific customization of large language models. ICM-LoRA enables more accurate LoRA parameter reconstruction than current parameter reconstruction methods.
arXiv Detail & Related papers (2025-01-29T13:12:01Z)
LoRA-X: Bridging Foundation Models with Training-Free Cross-Model Adaptation [48.22550575107633]
A new adapter, Cross-Model Low-Rank Adaptation (LoRA-X), enables the training-free transfer of LoRA parameters across source and target models. Our experiments demonstrate the effectiveness of LoRA-X for text-to-image generation.
arXiv Detail & Related papers (2025-01-27T23:02:24Z)
LoRA-FAIR: Federated LoRA Fine-Tuning with Aggregation and Initialization Refinement [5.162783756846019]
Foundation models (FMs) achieve strong performance across diverse tasks with task-specific fine-tuning. Low-Rank Adaptation (LoRA) methods like Low-Rank Adaptation (LoRA) reduce this cost by introducing low-rank matrices for tuning fewer parameters. LoRA-FAIR maintains computational and communication efficiency, yielding superior performance over state-of-the-art methods.
arXiv Detail & Related papers (2024-11-22T14:19:01Z)
Retrieval-Augmented Mixture of LoRA Experts for Uploadable Machine Learning [57.36978335727009]
Low-Rank Adaptation (LoRA) offers an efficient way to fine-tune large language models (LLMs) In this paper, we propose a framework that adaptively retrieves and composes multiple LoRAs based on input prompts.
arXiv Detail & Related papers (2024-06-24T05:24:41Z)
VB-LoRA: Extreme Parameter Efficient Fine-Tuning with Vector Banks [10.266224162377371]
Low-rank adaptation (LoRA) and its variants incur substantial storage and transmission costs. We introduce a "divide-and-share" paradigm that breaks the barriers of low-rank decomposition across matrix dimensions, modules, and layers. VB-LoRA achieves extreme parameter efficiency while maintaining comparable or better performance compared to state-of-the-art PEFT methods.
arXiv Detail & Related papers (2024-05-24T03:24:34Z)
Continual Forgetting for Pre-trained Vision Models [70.51165239179052]
In real-world scenarios, selective information is expected to be continuously removed from a pre-trained model. We propose Group Sparse LoRA (GS-LoRA) for efficient and effective deleting. We conduct extensive experiments on face recognition, object detection and image classification and demonstrate that GS-LoRA manages to forget specific classes with minimal impact on other classes.
arXiv Detail & Related papers (2024-03-18T07:33:56Z)
Chain of LoRA: Efficient Fine-tuning of Language Models via Residual Learning [31.036465632204663]
We introduce Chain of LoRA, an iterative optimization framework inspired by the Frank-Wolfe algorithm. We demonstrate that COLA can consistently outperform LoRA without additional computational or memory costs.
arXiv Detail & Related papers (2024-01-08T14:26:49Z)
FedRA: A Random Allocation Strategy for Federated Tuning to Unleash the Power of Heterogeneous Clients [50.13097183691517]
In real-world federated scenarios, there often exist a multitude of heterogeneous clients with varying computation and communication resources. We propose a novel federated tuning algorithm, FedRA. In each communication round, FedRA randomly generates an allocation matrix. It reorganizes a small number of layers from the original model based on the allocation matrix and fine-tunes using adapters.
arXiv Detail & Related papers (2023-11-19T04:43:16Z)
S-LoRA: Serving Thousands of Concurrent LoRA Adapters [59.490751234925206]
Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning method, is often employed to adapt a base model to a multitude of tasks. We present S-LoRA, a system designed for the scalable serving of many LoRA adapters.
arXiv Detail & Related papers (2023-11-06T17:26:17Z)
The Expressive Power of Low-Rank Adaptation [11.371811534310078]
Low-Rank Adaptation, a parameter-efficient fine-tuning method, has emerged as a prevalent technique for fine-tuning pre-trained models. This paper takes the first step to bridge the gap by theoretically analyzing the expressive power of LoRA. For Transformer networks, we show any model can be adapted to a target model of the same size with rank-$(fractextembedding size2)$ LoRA.
arXiv Detail & Related papers (2023-10-26T16:08:33Z)
CA-LoRA: Adapting Existing LoRA for Compressed LLMs to Enable Efficient Multi-Tasking on Personal Devices [78.16679232748196]
We introduce a Compression-Aware LoRA (CA-LoRA) framework to transfer Large Language Models (LLMs) to other tasks. Experiment results demonstrate that CA-LoRA outperforms the vanilla LoRA methods applied to a compressed LLM. The source code of CA-LoRA is available at https://github.com/thunlp/CA-LoRA.
arXiv Detail & Related papers (2023-07-15T04:37:11Z)
LoRA: Low-Rank Adaptation of Large Language Models [71.75808607987281]
Low-Rank Adaptation, or LoRA, freezes the pre-trained model weights and injects trainable rank decomposition into each layer of the Transformer architecture. For GPT-3, LoRA can reduce the number of trainable parameters by 10,000 times and the computation hardware requirement by 3 times compared to full fine-tuning.
arXiv Detail & Related papers (2021-06-17T17:37:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.