Dual Low-Rank Adaptation for Continual Learning with Pre-Trained Models
- URL: http://arxiv.org/abs/2411.00623v1
- Date: Fri, 01 Nov 2024 14:28:39 GMT
- Title: Dual Low-Rank Adaptation for Continual Learning with Pre-Trained Models
- Authors: Huancheng Chen, Jingtao Li, Nidham Gazagnadou, Weiming Zhuang, Chen Chen, Lingjuan Lyu,
- Abstract summary: Continual learning (CL) aims to enable vision transformers (ViTs) to learn new tasks over time.
catastrophic forgetting remains a persistent challenge.
We propose a novel PEFT-CL method called Dual Low-Rank Adaptation (DualLoRA)
- Score: 38.97142043836567
- License:
- Abstract: In the era of foundation models, we revisit continual learning~(CL), which aims to enable vision transformers (ViTs) to learn new tasks over time. However, as the scale of these models increases, catastrophic forgetting remains a persistent challenge, particularly in the presence of significant domain shifts across tasks. Recent studies highlight a crossover between CL techniques and parameter-efficient fine-tuning (PEFT), which focuses on fine-tuning only a small set of trainable parameters to adapt to downstream tasks, such as low-rank adaptation (LoRA). While LoRA achieves faster convergence and requires fewer trainable parameters, it has seldom been explored in the context of continual learning. To address this gap, we propose a novel PEFT-CL method called Dual Low-Rank Adaptation (DualLoRA), which introduces both an orthogonal LoRA adapter and a residual LoRA adapter parallel to pre-trained weights in each layer. These components are orchestrated by a dynamic memory mechanism to strike a balance between stability and plasticity. The orthogonal LoRA adapter's parameters are updated in an orthogonal subspace of previous tasks to mitigate catastrophic forgetting, while the residual LoRA adapter's parameters are updated in the residual subspace spanned by task-specific bases without interaction across tasks, offering complementary capabilities for fine-tuning new tasks. On ViT-based models, we demonstrate that DualLoRA offers significant advantages in accuracy, inference speed, and memory efficiency over existing CL methods across multiple benchmarks.
Related papers
- LoRA-Mini : Adaptation Matrices Decomposition and Selective Training [2.0670689746336]
Low-Rank Adaptation (LoRA) has emerged as a promising solution, enabling parameter-efficient fine-tuning by reducing the number of trainable parameters.
We propose LoRA-Mini, an optimized adaptation of LoRA that improves parameter efficiency by splitting low-rank matrices into four parts.
This approach achieves upto a 20x reduction compared to standard LoRA in the number of trainable parameters while preserving performance levels comparable to standard LoRA.
arXiv Detail & Related papers (2024-11-24T12:21:14Z) - Controlled Low-Rank Adaptation with Subspace Regularization for Continued Training on Large Language Models [13.56631686493347]
Large language models (LLMs) exhibit remarkable capabilities in natural language processing but face catastrophic forgetting when learning new tasks.
We propose Controlled LoRA (CLoRA), a subspace regularization method on LoRA structure.
arXiv Detail & Related papers (2024-10-22T08:27:23Z) - Is Parameter Collision Hindering Continual Learning in LLMs? [50.57658782050275]
Large Language Models (LLMs) often suffer from catastrophic forgetting when learning multiple tasks sequentially.
We show that building non-collision parameters is a more critical interdependence factor in addressing CL challenges.
We propose Non-collision Low-Rank Adaptation (N-LoRA), a simple yet effective approach leveraging low collision rates to enhance CL in LLMs.
arXiv Detail & Related papers (2024-10-14T05:54:11Z) - LoRTA: Low Rank Tensor Adaptation of Large Language Models [70.32218116940393]
Low Rank Adaptation (LoRA) is a popular Efficient Fine Tuning (PEFT) method that effectively adapts large pre-trained models for downstream tasks.
We propose a novel approach that employs a low rank tensor parametrization for model updates.
Our method is both efficient and effective for fine-tuning large language models, achieving a substantial reduction in the number of parameters while maintaining comparable performance.
arXiv Detail & Related papers (2024-10-05T06:59:50Z) - Tensor Train Low-rank Approximation (TT-LoRA): Democratizing AI with Accelerated LLMs [1.5503410315996757]
Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide range of natural language processing (NLP) tasks.
However, the ever-growing complexity of LLMs demands immense computational resources.
This paper introduces Train Low-Rank Approximation (TT-LoRA), a novel parameter-efficient fine-tuning (PEFT) approach.
arXiv Detail & Related papers (2024-08-02T04:45:58Z) - SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios.
In the early route, intermediate outputs are consolidated via an anti-redundancy operation.
In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z) - MELoRA: Mini-Ensemble Low-Rank Adapters for Parameter-Efficient Fine-Tuning [71.50432879573614]
Low-rank adaptation (LoRA) is based on the idea that the adaptation process is intrinsically low-dimensional.
We present MELoRA, a mini-ensemble low-rank adapters that uses fewer trainable parameters while maintaining a higher rank.
Our experimental results show that, compared to LoRA, MELoRA achieves better performance with 8 times fewer trainable parameters on natural language understanding tasks and 36 times fewer trainable parameters on instruction following tasks.
arXiv Detail & Related papers (2024-02-27T07:14:12Z) - PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation [65.268245109828]
We introduce PRILoRA, which linearly allocates a different rank for each layer, in an increasing manner, and performs pruning throughout the training process.
We validate the effectiveness of PRILoRA through extensive experiments on eight GLUE benchmarks, setting a new state of the art.
arXiv Detail & Related papers (2024-01-20T20:25:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.