Related papers: Is Parameter Collision Hindering Continual Learning in LLMs?

Is Parameter Collision Hindering Continual Learning in LLMs?

URL: http://arxiv.org/abs/2410.10179v1
Date: Mon, 14 Oct 2024 05:54:11 GMT
Title: Is Parameter Collision Hindering Continual Learning in LLMs?
Authors: Shuo Yang, Kun-Peng Ning, Yu-Yang Liu, Jia-Yu Yao, Yong-Hong Tian, Yi-Bing Song, Li Yuan,
Abstract summary: Large Language Models (LLMs) often suffer from catastrophic forgetting when learning multiple tasks sequentially. We show that building non-collision parameters is a more critical interdependence factor in addressing CL challenges. We propose Non-collision Low-Rank Adaptation (N-LoRA), a simple yet effective approach leveraging low collision rates to enhance CL in LLMs.
Score: 50.57658782050275
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) often suffer from catastrophic forgetting when learning multiple tasks sequentially, making continual learning (CL) essential for their dynamic deployment. Existing state-of-the-art (SOTA) methods, such as O-LoRA, typically focus on constructing orthogonality tasks to decouple parameter interdependence from various domains.In this paper, we reveal that building non-collision parameters is a more critical factor in addressing CL challenges. Our theoretical and experimental analyses demonstrate that non-collision parameters can provide better task orthogonality, which is a sufficient but unnecessary condition. Furthermore, knowledge from multiple domains will be preserved in non-collision parameter subspaces, making it more difficult to forget previously seen data. Leveraging this insight, we propose Non-collision Low-Rank Adaptation (N-LoRA), a simple yet effective approach leveraging low collision rates to enhance CL in LLMs. Experimental results on multiple CL benchmarks indicate that N-LoRA achieves superior performance (+2.9), higher task orthogonality (*4.1 times), and lower parameter collision (*58.1 times) than SOTA methods.

Related papers

SD-LoRA: Scalable Decoupled Low-Rank Adaptation for Class Incremental Learning [73.93639228235622]
Continual Learning with foundation models has emerged as a promising paradigm to exploit abundant knowledge acquired during pre-training for tackling sequential tasks. Existing prompt-based and Low-Rank Adaptation-based (LoRA-based) methods often require expanding a prompt/LoRA pool or retaining samples of previous tasks. We propose Scalable Decoupled LoRA (SD-LoRA) for class incremental learning, which continually separates the learning of the magnitude and direction of LoRA components without rehearsal.
arXiv Detail & Related papers (2025-01-22T20:00:41Z)
Dual Low-Rank Adaptation for Continual Learning with Pre-Trained Models [38.97142043836567]
Continual learning (CL) aims to enable vision transformers (ViTs) to learn new tasks over time. catastrophic forgetting remains a persistent challenge. We propose a novel PEFT-CL method called Dual Low-Rank Adaptation (DualLoRA)
arXiv Detail & Related papers (2024-11-01T14:28:39Z)
Controlled Low-Rank Adaptation with Subspace Regularization for Continued Training on Large Language Models [13.56631686493347]
Large language models (LLMs) exhibit remarkable capabilities in natural language processing but face catastrophic forgetting when learning new tasks. We propose Controlled LoRA (CLoRA), a subspace regularization method on LoRA structure.
arXiv Detail & Related papers (2024-10-22T08:27:23Z)
MTL-LoRA: Low-Rank Adaptation for Multi-Task Learning [74.43869839954168]
We propose MTL-LoRA, which retains the advantages of low-rank adaptation while significantly enhancing multi-task learning capabilities. MTL-LoRA augments LoRA by incorporating additional task-adaptive parameters that differentiate task-specific information. This approach enables large language models (LLMs) pre-trained on general corpus to adapt to different target task domains with a limited number of trainable parameters.
arXiv Detail & Related papers (2024-10-12T08:32:26Z)
Randomized Asymmetric Chain of LoRA: The First Meaningful Theoretical Framework for Low-Rank Adaptation [58.288682735160585]
Low-Rank Adaptation (LoRA) is a popular technique for finetuning models. LoRA often under performs when compared to full- parameter fine-tuning. We present a framework that rigorously analyzes the adaptation rates of LoRA methods.
arXiv Detail & Related papers (2024-10-10T18:51:53Z)
Hyperbolic Fine-tuning for Large Language Models [56.54715487997674]
This study investigates the non-Euclidean characteristics of large language models (LLMs) We show that token embeddings exhibit a high degree of hyperbolicity, indicating a latent tree-like structure in the embedding space. We introduce a new method called hyperbolic low-rank efficient fine-tuning, HypLoRA, that performs low-rank adaptation directly on the hyperbolic manifold.
arXiv Detail & Related papers (2024-10-05T02:58:25Z)
ICL-TSVD: Bridging Theory and Practice in Continual Learning with Pre-trained Models [103.45785408116146]
Continual learning (CL) aims to train a model that can solve multiple tasks presented sequentially. Recent CL approaches have achieved strong performance by leveraging large pre-trained models that generalize well to downstream tasks. However, such methods lack theoretical guarantees, making them prone to unexpected failures. We bridge this gap by integrating an empirically strong approach into a principled framework, designed to prevent forgetting.
arXiv Detail & Related papers (2024-10-01T12:58:37Z)
CURLoRA: Stable LLM Continual Fine-Tuning and Catastrophic Forgetting Mitigation [0.0]
CURLoRA is a novel approach to fine-tuning large language models. It mitigates catastrophic forgetting and reduces the number of trainable parameters. It maintains model stability and performance across tasks while significantly reducing the number of trainable parameters.
arXiv Detail & Related papers (2024-08-26T18:42:59Z)
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios. In the early route, intermediate outputs are consolidated via an anti-redundancy operation. In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z)
Analyzing and Reducing Catastrophic Forgetting in Parameter Efficient Tuning [9.38259062204602]
Large language models (LLMs) exhibit remarkable performance in language understanding and generation. LLMs are continuously fine-tuned on complex and diverse domain-specific downstream tasks. A trade-off needs to be kept between learning plasticity and memory stability.
arXiv Detail & Related papers (2024-02-29T05:27:45Z)
LoRA Meets Dropout under a Unified Framework [38.5176197615878]
Large language models (LLMs) have emerged as essential elements in numerous NLP applications. Various dropout methods, initially designed for full finetuning with all the parameters updated, alleviates overfitting associated with excessive parameter redundancy. We introduce a unified framework for a comprehensive investigation, which instantiates these methods based on dropping position, structural pattern and compensation measure.
arXiv Detail & Related papers (2024-02-25T07:09:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.