Orthogonal Subspace Learning for Language Model Continual Learning
- URL: http://arxiv.org/abs/2310.14152v1
- Date: Sun, 22 Oct 2023 02:23:44 GMT
- Title: Orthogonal Subspace Learning for Language Model Continual Learning
- Authors: Xiao Wang, Tianze Chen, Qiming Ge, Han Xia, Rong Bao, Rui Zheng, Qi
Zhang, Tao Gui, Xuanjing Huang
- Abstract summary: O-LoRA is a simple and efficient approach for continual learning in language models.
Our method induces only marginal additional parameter costs and requires no user data storage for replay.
- Score: 45.35861158925975
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Benefiting from massive corpora and advanced hardware, large language models
(LLMs) exhibit remarkable capabilities in language understanding and
generation. However, their performance degrades in scenarios where multiple
tasks are encountered sequentially, also known as catastrophic forgetting. In
this paper, we propose orthogonal low-rank adaptation (O-LoRA), a simple and
efficient approach for continual learning in language models, effectively
mitigating catastrophic forgetting while learning new tasks. Specifically,
O-LoRA learns tasks in different (low-rank) vector subspaces that are kept
orthogonal to each other in order to minimize interference. Our method induces
only marginal additional parameter costs and requires no user data storage for
replay. Experimental results on continual learning benchmarks show that our
method outperforms state-of-the-art methods. Furthermore, compared to previous
approaches, our method excels in preserving the generalization ability of LLMs
on unseen tasks.
Related papers
- SwitchCIT: Switching for Continual Instruction Tuning of Large Language Models [14.085371250265224]
Large language models (LLMs) have exhibited impressive capabilities in various domains, particularly in general language understanding.
However these models, trained on massive text data, may not be finely optimized for specific tasks triggered by instructions.
This work addresses the catastrophic forgetting in continual instruction learning for LLMs through a switching mechanism for routing computations to parameter-efficient tuned models.
arXiv Detail & Related papers (2024-07-16T14:37:33Z) - To Each (Textual Sequence) Its Own: Improving Memorized-Data Unlearning in Large Language Models [3.4990427823966828]
LLMs have been found to memorize training textual sequences and regurgitate verbatim said sequences during text generation time.
This fact is known to be the cause of privacy and related (e.g., copyright) problems.
Unlearning in LLMs then takes the form of devising new algorithms that will properly deal with these side-effects.
arXiv Detail & Related papers (2024-05-06T01:21:50Z) - Scalable Language Model with Generalized Continual Learning [58.700439919096155]
The Joint Adaptive Re-ization (JARe) is integrated with Dynamic Task-related Knowledge Retrieval (DTKR) to enable adaptive adjustment of language models based on specific downstream tasks.
Our method demonstrates state-of-the-art performance on diverse backbones and benchmarks, achieving effective continual learning in both full-set and few-shot scenarios with minimal forgetting.
arXiv Detail & Related papers (2024-04-11T04:22:15Z) - Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization [51.34904967046097]
Continual learning seeks to overcome the challenge of catastrophic forgetting, where a model forgets previously learnt information.
We introduce a novel prior-based method that better constrains parameter growth, reducing catastrophic forgetting.
Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments.
arXiv Detail & Related papers (2023-09-15T17:10:51Z) - Complementary Learning Subnetworks for Parameter-Efficient
Class-Incremental Learning [40.13416912075668]
We propose a rehearsal-free CIL approach that learns continually via the synergy between two Complementary Learning Subnetworks.
Our method achieves competitive results against state-of-the-art methods, especially in accuracy gain, memory cost, training efficiency, and task-order.
arXiv Detail & Related papers (2023-06-21T01:43:25Z) - Learning Bayesian Sparse Networks with Full Experience Replay for
Continual Learning [54.7584721943286]
Continual Learning (CL) methods aim to enable machine learning models to learn new tasks without catastrophic forgetting of those that have been previously mastered.
Existing CL approaches often keep a buffer of previously-seen samples, perform knowledge distillation, or use regularization techniques towards this goal.
We propose to only activate and select sparse neurons for learning current and past tasks at any stage.
arXiv Detail & Related papers (2022-02-21T13:25:03Z) - Continually Learning Self-Supervised Representations with Projected
Functional Regularization [39.92600544186844]
Recent self-supervised learning methods are able to learn high-quality image representations and are closing the gap with supervised methods.
These methods are unable to acquire new knowledge incrementally -- they are, in fact, mostly used only as a pre-training phase with IID data.
To prevent forgetting of previous knowledge, we propose the usage of functional regularization.
arXiv Detail & Related papers (2021-12-30T11:59:23Z) - Towards Lifelong Learning of Multilingual Text-To-Speech Synthesis [87.75833205560406]
This work presents a lifelong learning approach to train a multilingual Text-To-Speech (TTS) system.
It does not require pooled data from all languages altogether, and thus alleviates the storage and computation burden.
arXiv Detail & Related papers (2021-10-09T07:00:38Z) - Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less
Forgetting [66.45372974713189]
We propose a recall and learn mechanism, which adopts the idea of multi-task learning and jointly learns pretraining tasks and downstream tasks.
Experiments show that our method achieves state-of-the-art performance on the GLUE benchmark.
We provide open-source RecAdam, which integrates the proposed mechanisms into Adam to facility the NLP community.
arXiv Detail & Related papers (2020-04-27T08:59:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.