Self-Expansion of Pre-trained Models with Mixture of Adapters for Continual Learning
- URL: http://arxiv.org/abs/2403.18886v2
- Date: Sun, 9 Jun 2024 12:24:03 GMT
- Title: Self-Expansion of Pre-trained Models with Mixture of Adapters for Continual Learning
- Authors: Huiyi Wang, Haodong Lu, Lina Yao, Dong Gong,
- Abstract summary: Continual learning (CL) aims to continually accumulate knowledge from a non-stationary data stream without catastrophic forgetting of learned knowledge.
Current PTM-based CL methods perform effective continual adaptation on downstream tasks by adding learnable adapters or prompts upon the frozen PTMs.
We propose Self-Expansion of pre-trained models with Modularized Adaptation (SEMA), a novel approach to enhance the control of stability-plasticity balance in PTM-based CL.
- Score: 21.19820308728003
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Continual learning (CL) aims to continually accumulate knowledge from a non-stationary data stream without catastrophic forgetting of learned knowledge, requiring a balance between stability and adaptability. Relying on the generalizable representation in pre-trained models (PTMs), PTM-based CL methods perform effective continual adaptation on downstream tasks by adding learnable adapters or prompts upon the frozen PTMs. However, many existing PTM-based CL methods use restricted adaptation on a fixed set of these modules to avoid forgetting, suffering from limited CL ability. Periodically adding task-specific modules results in linear model growth rate and impaired knowledge reuse. We propose Self-Expansion of pre-trained models with Modularized Adaptation (SEMA), a novel approach to enhance the control of stability-plasticity balance in PTM-based CL. SEMA automatically decides to reuse or add adapter modules on demand in CL, depending on whether significant distribution shift that cannot be handled is detected at different representation levels. We design modular adapter consisting of a functional adapter and a representation descriptor. The representation descriptors are trained as a distribution shift indicator and used to trigger self-expansion signals. For better composing the adapters, an expandable weighting router is learned jointly for mixture of adapter outputs. SEMA enables better knowledge reuse and sub-linear expansion rate. Extensive experiments demonstrate the effectiveness of the proposed self-expansion method, achieving state-of-the-art performance compared to PTM-based CL methods without memory rehearsal.
Related papers
- Dual-CBA: Improving Online Continual Learning via Dual Continual Bias Adaptors from a Bi-level Optimization Perspective [39.74441755776661]
In online continual learning (CL), models trained on changing distributions easily forget previously learned knowledge and bias toward newly received tasks.
We present Continual Bias Adaptor (CBA), a bi-level framework that augments the classification network to adapt to catastrophic distribution shifts during training.
We propose a novel class-agnostic CBA module that separately aggregates the posterior probabilities of classes from new and old tasks, and applies a stable adjustment to the resulting posterior probabilities.
arXiv Detail & Related papers (2024-08-26T03:19:52Z) - FeTT: Continual Class Incremental Learning via Feature Transformation Tuning [19.765229703131876]
Continual learning (CL) aims to extend deep models from static and enclosed environments to dynamic and complex scenarios.
Recent CL models have gradually shifted towards the utilization of pre-trained models with parameter-efficient fine-tuning strategies.
This paper proposes feature transformation tuning (FeTT) model to non-parametrically fine-tune backbone features across all tasks.
arXiv Detail & Related papers (2024-05-20T06:33:50Z) - Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters [65.15700861265432]
We present a parameter-efficient continual learning framework to alleviate long-term forgetting in incremental learning with vision-language models.
Our approach involves the dynamic expansion of a pre-trained CLIP model, through the integration of Mixture-of-Experts (MoE) adapters.
To preserve the zero-shot recognition capability of vision-language models, we introduce a Distribution Discriminative Auto-Selector.
arXiv Detail & Related papers (2024-03-18T08:00:23Z) - Kalman Filter for Online Classification of Non-Stationary Data [101.26838049872651]
In Online Continual Learning (OCL) a learning system receives a stream of data and sequentially performs prediction and training steps.
We introduce a probabilistic Bayesian online learning model by using a neural representation and a state space model over the linear predictor weights.
In experiments in multi-class classification we demonstrate the predictive ability of the model and its flexibility to capture non-stationarity.
arXiv Detail & Related papers (2023-06-14T11:41:42Z) - Revisiting Class-Incremental Learning with Pre-Trained Models: Generalizability and Adaptivity are All You Need [84.3507610522086]
Class-incremental learning (CIL) aims to adapt to emerging new classes without forgetting old ones.
Recent pre-training has achieved substantial progress, making vast pre-trained models (PTMs) accessible for CIL.
We argue that the core factors in CIL are adaptivity for model updating and generalizability for knowledge transferring.
arXiv Detail & Related papers (2023-03-13T17:59:02Z) - CLIPood: Generalizing CLIP to Out-of-Distributions [73.86353105017076]
Contrastive language-image pre-training (CLIP) models have shown impressive zero-shot ability, but the further adaptation of CLIP on downstream tasks undesirably degrades OOD performances.
We propose CLIPood, a fine-tuning method that can adapt CLIP models to OOD situations where both domain shifts and open classes may occur on unseen test data.
Experiments on diverse datasets with different OOD scenarios show that CLIPood consistently outperforms existing generalization techniques.
arXiv Detail & Related papers (2023-02-02T04:27:54Z) - Meta-Causal Feature Learning for Out-of-Distribution Generalization [71.38239243414091]
This paper presents a balanced meta-causal learner (BMCL), which includes a balanced task generation module (BTG) and a meta-causal feature learning module (MCFL)
BMCL effectively identifies the class-invariant visual regions for classification and may serve as a general framework to improve the performance of the state-of-the-art methods.
arXiv Detail & Related papers (2022-08-22T09:07:02Z) - Momentum Pseudo-Labeling for Semi-Supervised Speech Recognition [55.362258027878966]
We present momentum pseudo-labeling (MPL) as a simple yet effective strategy for semi-supervised speech recognition.
MPL consists of a pair of online and offline models that interact and learn from each other, inspired by the mean teacher method.
The experimental results demonstrate that MPL effectively improves over the base model and is scalable to different semi-supervised scenarios.
arXiv Detail & Related papers (2021-06-16T16:24:55Z) - Data-Driven Learning and Load Ensemble Control [1.647866856596524]
This study aims to engage distributed small-scale flexible loads, such as thermostatically controllable loads (TCLs) to provide grid support services.
The efficiency of this data-driven learning is demonstrated through simulations on Heating, Cooling & Ventilation units in a testbed neighborhood of residential houses.
arXiv Detail & Related papers (2020-04-20T23:32:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.