Sparse Adapter Fusion for Continual Learning in NLP
- URL: http://arxiv.org/abs/2602.02502v1
- Date: Tue, 20 Jan 2026 15:58:25 GMT
- Title: Sparse Adapter Fusion for Continual Learning in NLP
- Authors: Min Zeng, Xi Chen, Haiqin Yang, Yike Guo,
- Abstract summary: A Sparse Adapter Fusion Method (SAFM) dynamically fuses old and new adapters to address these challenges.<n> Experimental results consistently show that SAFM outperforms state-of-the-art (SOTA) methods.
- Score: 34.701612504273946
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Continual learning in natural language processing plays a crucial role in adapting to evolving data and preventing catastrophic forgetting. Despite significant progress, existing methods still face challenges, such as inefficient parameter reuse across tasks, risking catastrophic forgetting when tasks are dissimilar, and the unnecessary introduction of new parameters for each task, which hampers knowledge sharing among similar tasks. To tackle these issues, we propose a Sparse Adapter Fusion Method (SAFM), which dynamically fuses old and new adapters to address these challenges. SAFM operates in two stages: the decision stage and the tuning stage. In the decision stage, SAFM determines whether to incorporate a new adapter, reuse an existing one, or add an empty adapter. The architecture search procedure, designed to prioritize reusing or adding empty adapters, minimizes parameter consumption and maximizes reuse. In the tuning stage, SAFM especially facilitates a layer-wise loss to encourage differentiation between adapters, effectively capturing knowledge within the same task. Experimental results consistently show that SAFM outperforms state-of-the-art (SOTA) methods, achieving comparable performance while utilizing less than 60% of the parameters.
Related papers
- Mixtures of SubExperts for Large Language Continual Learning [6.425296129700846]
Adapting Large Language Models to a continuous stream of tasks is a critical yet challenging endeavor.<n>Reusing a single set of PEFT parameters for new tasks often leads to catastrophic forgetting of prior knowledge.<n>We propose a novel adaptive PEFT method referred to as textitMixtures of SubExperts (MoSEs), a novel continual learning framework designed for minimal forgetting and efficient scalability.
arXiv Detail & Related papers (2025-11-09T05:44:45Z) - HAM: Hierarchical Adapter Merging for Scalable Continual Learning [5.958899330375292]
New knowledge can interfere with previously learned information, causing the model to forget earlier knowledge in favor of the new.<n>This paper introduces Hierarchical Adapters Merging (HAM), a novel framework that dynamically combines adapters from different tasks during training.<n>Ham significantly outperforms state-of-the-art methods, particularly as the number of tasks increases.
arXiv Detail & Related papers (2025-09-16T16:18:19Z) - EKPC: Elastic Knowledge Preservation and Compensation for Class-Incremental Learning [53.88000987041739]
Class-Incremental Learning (CIL) aims to enable AI models to continuously learn from sequentially arriving data of different classes over time.<n>We propose the Elastic Knowledge Preservation and Compensation (EKPC) method, integrating Importance-aware importance Regularization (IPR) and Trainable Semantic Drift Compensation (TSDC) for CIL.
arXiv Detail & Related papers (2025-06-14T05:19:58Z) - LLaVA-CMoE: Towards Continual Mixture of Experts for Large Vision-Language Models [21.888139819188105]
LLaVA-CMoE is a continual learning framework for large language models.<n> Probe-Guided Knowledge Extension mechanism determines when and where new experts should be added.<n>Probabilistic Task Locator assigns each task a dedicated, lightweight router.
arXiv Detail & Related papers (2025-03-27T07:36:11Z) - Not All Adapters Matter: Selective Adapter Freezing for Memory-Efficient Fine-Tuning of Language Models [10.593991842751631]
adapter-tuning provides a parameter-efficient fine-tuning by introducing lightweight trainable modules.<n>We show that each adapter unequally contributes to both task performance and resource usage.<n>We propose Selective Adapter FrEezing (SAFE), which gradually freezes less important early to reduce unnecessary resource usage.
arXiv Detail & Related papers (2024-11-26T08:41:45Z) - CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for Task-Aware Parameter-Efficient Fine-tuning [101.81127587760831]
Current fine-tuning methods build adapters widely of the context of downstream task to learn, or the context of important knowledge to maintain.<n>We propose CorDA, a Context-oriented Decomposition Adaptation method that builds learnable task-aware adapters.<n>Our method enables two options, the knowledge-preserved adaptation and the instruction-previewed adaptation.
arXiv Detail & Related papers (2024-06-07T19:10:35Z) - Remembering Transformer for Continual Learning [9.879896956915598]
We propose Remembering Transformer, inspired by the brain's Complementary Learning Systems.
Remembering Transformer employs a mixture-of-adapters architecture and a generative model-based novelty detection mechanism.
We conducted extensive experiments, including ablation studies on the novelty detection mechanism and model capacity of the mixture-of-adapters.
arXiv Detail & Related papers (2024-04-11T07:22:14Z) - MerA: Merging Pretrained Adapters For Few-Shot Learning [71.44422347502409]
We propose textbftextttMerging Pretrained Adapters (MerA) that efficiently incorporates pretrained adapters to a single model through model fusion.
Experiments on two PLMs demonstrate that MerA substantial improvements compared to both single adapters and AdapterFusion.
arXiv Detail & Related papers (2023-08-30T12:10:17Z) - E2-AEN: End-to-End Incremental Learning with Adaptively Expandable
Network [57.87240860624937]
We propose an end-to-end trainable adaptively expandable network named E2-AEN.
It dynamically generates lightweight structures for new tasks without any accuracy drop in previous tasks.
E2-AEN reduces cost and can be built upon any feed-forward architectures in an end-to-end manner.
arXiv Detail & Related papers (2022-07-14T09:04:51Z) - AdapterBias: Parameter-efficient Token-dependent Representation Shift
for Adapters in NLP Tasks [55.705355299065474]
Transformer-based pre-trained models with millions of parameters require large storage.
Recent approaches tackle this shortcoming by training adapters, but these approaches still require a relatively large number of parameters.
In this study, AdapterBias, a surprisingly simple yet effective adapter architecture, is proposed.
arXiv Detail & Related papers (2022-04-30T16:49:41Z) - Efficient Continual Adaptation for Generative Adversarial Networks [97.20244383723853]
We present a continual learning approach for generative adversarial networks (GANs)
Our approach is based on learning a set of global and task-specific parameters.
We show that the feature-map transformation based approach outperforms state-of-the-art continual GANs methods.
arXiv Detail & Related papers (2021-03-06T05:09:37Z) - AdapterFusion: Non-Destructive Task Composition for Transfer Learning [104.9639614787314]
Sequential fine-tuning and multi-task learning are methods aiming to incorporate knowledge from multiple tasks.
We propose AdapterFusion, a new two stage learning algorithm that leverages knowledge from multiple tasks.
We show that our approach outperforms traditional strategies such as full fine-tuning as well as multi-task learning.
arXiv Detail & Related papers (2020-05-01T07:03:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.