Self-Expansion of Pre-trained Models with Mixture of Adapters for Continual Learning
- URL: http://arxiv.org/abs/2403.18886v2
- Date: Sun, 9 Jun 2024 12:24:03 GMT
- Title: Self-Expansion of Pre-trained Models with Mixture of Adapters for Continual Learning
- Authors: Huiyi Wang, Haodong Lu, Lina Yao, Dong Gong,
- Abstract summary: Continual learning (CL) aims to continually accumulate knowledge from a non-stationary data stream without catastrophic forgetting of learned knowledge.
Current PTM-based CL methods perform effective continual adaptation on downstream tasks by adding learnable adapters or prompts upon the frozen PTMs.
We propose Self-Expansion of pre-trained models with Modularized Adaptation (SEMA), a novel approach to enhance the control of stability-plasticity balance in PTM-based CL.
- Score: 21.19820308728003
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Continual learning (CL) aims to continually accumulate knowledge from a non-stationary data stream without catastrophic forgetting of learned knowledge, requiring a balance between stability and adaptability. Relying on the generalizable representation in pre-trained models (PTMs), PTM-based CL methods perform effective continual adaptation on downstream tasks by adding learnable adapters or prompts upon the frozen PTMs. However, many existing PTM-based CL methods use restricted adaptation on a fixed set of these modules to avoid forgetting, suffering from limited CL ability. Periodically adding task-specific modules results in linear model growth rate and impaired knowledge reuse. We propose Self-Expansion of pre-trained models with Modularized Adaptation (SEMA), a novel approach to enhance the control of stability-plasticity balance in PTM-based CL. SEMA automatically decides to reuse or add adapter modules on demand in CL, depending on whether significant distribution shift that cannot be handled is detected at different representation levels. We design modular adapter consisting of a functional adapter and a representation descriptor. The representation descriptors are trained as a distribution shift indicator and used to trigger self-expansion signals. For better composing the adapters, an expandable weighting router is learned jointly for mixture of adapter outputs. SEMA enables better knowledge reuse and sub-linear expansion rate. Extensive experiments demonstrate the effectiveness of the proposed self-expansion method, achieving state-of-the-art performance compared to PTM-based CL methods without memory rehearsal.
Related papers
- AdaPTS: Adapting Univariate Foundation Models to Probabilistic Multivariate Time Series Forecasting [10.899510048905926]
We present adapters for managing intricate dependencies among features and quantifying uncertainty in predictions.
Experiments conducted on both synthetic and real-world datasets confirm the efficacy of adapters.
Our framework, AdaPTS, positions adapters as a modular, scalable, and effective solution.
arXiv Detail & Related papers (2025-02-14T15:46:19Z) - Continuous Knowledge-Preserving Decomposition for Few-Shot Continual Learning [89.11481059492608]
Few-shot class-incremental learning (FSCIL) involves learning new classes from limited data while retaining prior knowledge.
We propose Continuous Knowledge-Preserving Decomposition for FSCIL (CKPD-FSCIL), a framework that decomposes a model's weights into two parts.
Experiments on multiple benchmarks show that CKPD-FSCIL outperforms state-of-the-art methods.
arXiv Detail & Related papers (2025-01-09T07:18:48Z) - Enhancing Online Continual Learning with Plug-and-Play State Space Model and Class-Conditional Mixture of Discretization [72.81319836138347]
Online continual learning (OCL) seeks to learn new tasks from data streams that appear only once, while retaining knowledge of previously learned tasks.
Most existing methods rely on replay, focusing on enhancing memory retention through regularization or distillation.
We introduce a plug-and-play module, S6MOD, which can be integrated into most existing methods and directly improve adaptability.
arXiv Detail & Related papers (2024-12-24T05:25:21Z) - FeTT: Continual Class Incremental Learning via Feature Transformation Tuning [19.765229703131876]
Continual learning (CL) aims to extend deep models from static and enclosed environments to dynamic and complex scenarios.
Recent CL models have gradually shifted towards the utilization of pre-trained models with parameter-efficient fine-tuning strategies.
This paper proposes feature transformation tuning (FeTT) model to non-parametrically fine-tune backbone features across all tasks.
arXiv Detail & Related papers (2024-05-20T06:33:50Z) - Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters [65.15700861265432]
We present a parameter-efficient continual learning framework to alleviate long-term forgetting in incremental learning with vision-language models.
Our approach involves the dynamic expansion of a pre-trained CLIP model, through the integration of Mixture-of-Experts (MoE) adapters.
To preserve the zero-shot recognition capability of vision-language models, we introduce a Distribution Discriminative Auto-Selector.
arXiv Detail & Related papers (2024-03-18T08:00:23Z) - Revisiting Class-Incremental Learning with Pre-Trained Models: Generalizability and Adaptivity are All You Need [84.3507610522086]
Class-incremental learning (CIL) aims to adapt to emerging new classes without forgetting old ones.
Recent pre-training has achieved substantial progress, making vast pre-trained models (PTMs) accessible for CIL.
We argue that the core factors in CIL are adaptivity for model updating and generalizability for knowledge transferring.
arXiv Detail & Related papers (2023-03-13T17:59:02Z) - CLIPood: Generalizing CLIP to Out-of-Distributions [73.86353105017076]
Contrastive language-image pre-training (CLIP) models have shown impressive zero-shot ability, but the further adaptation of CLIP on downstream tasks undesirably degrades OOD performances.
We propose CLIPood, a fine-tuning method that can adapt CLIP models to OOD situations where both domain shifts and open classes may occur on unseen test data.
Experiments on diverse datasets with different OOD scenarios show that CLIPood consistently outperforms existing generalization techniques.
arXiv Detail & Related papers (2023-02-02T04:27:54Z) - Momentum Pseudo-Labeling for Semi-Supervised Speech Recognition [55.362258027878966]
We present momentum pseudo-labeling (MPL) as a simple yet effective strategy for semi-supervised speech recognition.
MPL consists of a pair of online and offline models that interact and learn from each other, inspired by the mean teacher method.
The experimental results demonstrate that MPL effectively improves over the base model and is scalable to different semi-supervised scenarios.
arXiv Detail & Related papers (2021-06-16T16:24:55Z) - Data-Driven Learning and Load Ensemble Control [1.647866856596524]
This study aims to engage distributed small-scale flexible loads, such as thermostatically controllable loads (TCLs) to provide grid support services.
The efficiency of this data-driven learning is demonstrated through simulations on Heating, Cooling & Ventilation units in a testbed neighborhood of residential houses.
arXiv Detail & Related papers (2020-04-20T23:32:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.