Rapid Switching and Multi-Adapter Fusion via Sparse High Rank Adapters
- URL: http://arxiv.org/abs/2407.16712v1
- Date: Mon, 22 Jul 2024 22:46:36 GMT
- Title: Rapid Switching and Multi-Adapter Fusion via Sparse High Rank Adapters
- Authors: Kartikeya Bhardwaj, Nilesh Prasad Pandey, Sweta Priyadarshi, Viswanath Ganapathy, Rafael Esteves, Shreya Kadambi, Shubhankar Borse, Paul Whatmough, Risheek Garrepalli, Mart Van Baalen, Harris Teague, Markus Nagel,
- Abstract summary: We propose Sparse High Rank Adapters (SHiRA) that directly finetune 1-2% of the base model weights while leaving others unchanged.
This high sparsity incurs no inference overhead, enables rapid switching directly in the fused mode, and significantly reduces concept-loss during multi-adapter fusion.
- Score: 16.160749645651567
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this paper, we propose Sparse High Rank Adapters (SHiRA) that directly finetune 1-2% of the base model weights while leaving others unchanged, thus, resulting in a highly sparse adapter. This high sparsity incurs no inference overhead, enables rapid switching directly in the fused mode, and significantly reduces concept-loss during multi-adapter fusion. Our extensive experiments on LVMs and LLMs demonstrate that finetuning merely 1-2% parameters in the base model is sufficient for many adapter tasks and significantly outperforms Low Rank Adaptation (LoRA). We also show that SHiRA is orthogonal to advanced LoRA methods such as DoRA and can be easily combined with existing techniques.
Related papers
- MSLoRA: Multi-Scale Low-Rank Adaptation via Attention Reweighting [6.335488846185043]
MSLoRA is a backbone-agnostic, parameter-efficient adapter that reweights feature responses rather than re-tuning the underlying backbone.<n>MSLoRA unifies adaptation for both convolutional neural networks (CNNs) and vision transformers (ViTs)
arXiv Detail & Related papers (2025-11-16T00:35:37Z) - FLoRA: Fused forward-backward adapters for parameter efficient fine-tuning and reducing inference-time latencies of LLMs [7.771813594229729]
We propose a family of fused forward-backward adapters (FFBA) for parameter-efficient fine-tuning of large language models (LLMs) on downstream tasks.<n> Experimental results show that the proposed FFB adapters perform significantly better than the popularly used LoRA in both accuracy and latency.
arXiv Detail & Related papers (2025-10-28T12:45:45Z) - AdaRing: Towards Ultra-Light Vision-Language Adaptation via Cross-Layer Tensor Ring Decomposition [41.654675205772485]
We propose a vision-language fine-tuning framework based on cross-layer tensor ring decomposition (TRD) with the integration and collaboration of diverse adapters, called AdaRing.<n>Our experiments show that the proposed AdaRing achieves the state-of-the-art performance while reducing average training parameters by 90%.
arXiv Detail & Related papers (2025-08-16T01:56:27Z) - Exploring Sparse Adapters for Scalable Merging of Parameter Efficient Experts [72.22148263683037]
We study the properties of sparse adapters, which train only a subset of weights in the base neural network, as potential building blocks of modular architectures.<n>First, we propose a simple method for training highly effective sparse adapters, which is conceptually simpler than existing methods in the literature.<n>Next, we investigate the merging properties of these sparse adapters by merging adapters for up to 20 natural language processing tasks.
arXiv Detail & Related papers (2025-07-09T03:25:45Z) - Improving Robustness of Foundation Models in Domain Adaptation with Soup-Adapters [0.0]
We show that by training multiple independent adapters and averaging their outputs, the new model has a higher performance and is more robust to distribution shifts compared to any individual adapter.<n>This is also the first study to explore CLIP adapter-style techniques for DINOv2 and to directly compare them with CLIP in this setting.
arXiv Detail & Related papers (2025-07-08T09:26:10Z) - Come Together, But Not Right Now: A Progressive Strategy to Boost Low-Rank Adaptation [21.137278840000366]
Low-rank adaptation (LoRA) has emerged as a leading parameter-efficient fine-tuning technique for adapting large foundation models.<n>We propose CoTo pruning, a progressive training strategy that gradually increases adapters' activation probability over the course of fine-tuning.
arXiv Detail & Related papers (2025-06-06T03:33:06Z) - Dual Low-Rank Adaptation for Continual Learning with Pre-Trained Models [38.97142043836567]
Continual learning (CL) aims to enable vision transformers (ViTs) to learn new tasks over time.
catastrophic forgetting remains a persistent challenge.
We propose a novel PEFT-CL method called Dual Low-Rank Adaptation (DualLoRA)
arXiv Detail & Related papers (2024-11-01T14:28:39Z) - MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning [9.91790333647256]
Low-rank adaptation (LoRA) and its mixture-of-experts (MOE) variants are highly effective parameter-efficient fine-tuning (PEFT) methods.
We propose Mixture of Low-Rank Adaptation (MiLoRA), a novel and efficient LoRA variant.
MiLoRA differs from previous MOE-style LoRA methods by considering each LoRA module as an expert and employing a prompt-aware routing mechanism.
arXiv Detail & Related papers (2024-10-23T17:04:40Z) - Hadamard Adapter: An Extreme Parameter-Efficient Adapter Tuning Method for Pre-trained Language Models [108.08773541490191]
Pre-trained Language models (PLMs) have a huge amount of parameters, fine-tuning them is often expensive and time consuming.
It is necessary to adopt a parameter-efficient approach to reduce parameters of PLMs in fine-tuning without compromising their performance in downstream tasks.
In this paper, we design a novel adapter which only acts on self-attention outputs in PLMs.
arXiv Detail & Related papers (2024-07-04T18:21:28Z) - Sparse High Rank Adapters [16.160749645651567]
Low Rank Adaptation (LoRA) has gained massive attention in the recent generative AI research.
We propose Sparse High Rank Adapters (SHiRA), a new paradigm which incurs no inference overhead, enables rapid switching, and significantly reduces concept-loss.
arXiv Detail & Related papers (2024-06-19T03:13:11Z) - MeteoRA: Multiple-tasks Embedded LoRA for Large Language Models [4.978361907192563]
MeteoRA is a scalable and efficient framework that reuses multiple task-specific LoRA adapters into the base LLM.
MeteoRA achieves superior performance in handling composite tasks, effectively solving ten sequential problems in a single inference pass.
arXiv Detail & Related papers (2024-05-19T20:46:07Z) - LCM-LoRA: A Universal Stable-Diffusion Acceleration Module [52.8517132452467]
Latent Consistency Models (LCMs) have achieved impressive performance in accelerating text-to-image generative tasks.
This report further extends LCMs' potential by applying LoRA distillation to larger Stable-Diffusion models.
We identify the LoRA parameters obtained through LCM distillation as a universal Stable-Diffusion acceleration module, named LCM-LoRA.
arXiv Detail & Related papers (2023-11-09T18:04:15Z) - S-LoRA: Serving Thousands of Concurrent LoRA Adapters [59.490751234925206]
Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning method, is often employed to adapt a base model to a multitude of tasks.
We present S-LoRA, a system designed for the scalable serving of many LoRA adapters.
arXiv Detail & Related papers (2023-11-06T17:26:17Z) - MerA: Merging Pretrained Adapters For Few-Shot Learning [71.44422347502409]
We propose textbftextttMerging Pretrained Adapters (MerA) that efficiently incorporates pretrained adapters to a single model through model fusion.
Experiments on two PLMs demonstrate that MerA substantial improvements compared to both single adapters and AdapterFusion.
arXiv Detail & Related papers (2023-08-30T12:10:17Z) - LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of
Large Language Models [75.25782573728677]
This paper presents a framework for adapter-based parameter-efficient fine-tuning (PEFT) of language models (LLMs)
The framework includes state-of-the-art open-access LLMs such as LLaMA, BLOOM, and GPT-J, as well as widely used adapters such as Series adapters, Parallel adapter, Prompt-based learning and Reparametrization-based methods.
We evaluate the effectiveness of the adapters on fourteen datasets from two different reasoning tasks, Arithmetic Reasoning and Commonsense Reasoning.
arXiv Detail & Related papers (2023-04-04T16:31:37Z) - AdaMix: Mixture-of-Adapter for Parameter-efficient Tuning of Large
Language Models [119.7093605087114]
Fine-tuning large-scale pre-trained language models to downstream tasks require updating hundreds of millions of parameters.
This not only increases the serving cost to store a large copy of the model weights for every task, but also exhibits instability during few-shot task adaptation.
We introduce a new mechanism to improve adapter capacity without increasing parameters or computational cost by two key techniques.
arXiv Detail & Related papers (2022-05-24T23:41:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.