Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters
- URL: http://arxiv.org/abs/2403.11549v2
- Date: Mon, 3 Jun 2024 07:45:33 GMT
- Title: Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters
- Authors: Jiazuo Yu, Yunzhi Zhuge, Lu Zhang, Ping Hu, Dong Wang, Huchuan Lu, You He,
- Abstract summary: We present a parameter-efficient continual learning framework to alleviate long-term forgetting in incremental learning with vision-language models.
Our approach involves the dynamic expansion of a pre-trained CLIP model, through the integration of Mixture-of-Experts (MoE) adapters.
To preserve the zero-shot recognition capability of vision-language models, we introduce a Distribution Discriminative Auto-Selector.
- Score: 65.15700861265432
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Continual learning can empower vision-language models to continuously acquire new knowledge, without the need for access to the entire historical dataset. However, mitigating the performance degradation in large-scale models is non-trivial due to (i) parameter shifts throughout lifelong learning and (ii) significant computational burdens associated with full-model tuning. In this work, we present a parameter-efficient continual learning framework to alleviate long-term forgetting in incremental learning with vision-language models. Our approach involves the dynamic expansion of a pre-trained CLIP model, through the integration of Mixture-of-Experts (MoE) adapters in response to new tasks. To preserve the zero-shot recognition capability of vision-language models, we further introduce a Distribution Discriminative Auto-Selector (DDAS) that automatically routes in-distribution and out-of-distribution inputs to the MoE Adapter and the original CLIP, respectively. Through extensive experiments across various settings, our proposed method consistently outperforms previous state-of-the-art approaches while concurrently reducing parameter training burdens by 60%. Our code locates at https://github.com/JiazuoYu/MoE-Adapters4CL
Related papers
- Multi-Stage Knowledge Integration of Vision-Language Models for Continual Learning [79.46570165281084]
We propose a Multi-Stage Knowledge Integration network (MulKI) to emulate the human learning process in distillation methods.
MulKI achieves this through four stages, including Eliciting Ideas, Adding New Ideas, Distinguishing Ideas, and Making Connections.
Our method demonstrates significant improvements in maintaining zero-shot capabilities while supporting continual learning across diverse downstream tasks.
arXiv Detail & Related papers (2024-11-11T07:36:19Z) - Adaptive Variational Continual Learning via Task-Heuristic Modelling [3.6119958671506707]
Variational continual learning () is a turn-key learning algorithm that has state-of-the-art performance among the best continual learning models.
In our work, we explore an extension of the generalized variational continual learning (G) model, named Auto, which combines tasks for informed learning and model optimization.
arXiv Detail & Related papers (2024-08-29T13:28:11Z) - Semantically-Shifted Incremental Adapter-Tuning is A Continual ViTransformer [44.10678347943115]
Class-incremental learning (CIL) aims to enable models to continuously learn new classes while overcoming catastrophic forgetting.
In this paper, we revisit different parameter-efficient tuning (PET) methods within the context of continual learning.
We observe that adapter tuning demonstrates superiority over prompt-based methods, even without parameter expansion in each learning session.
arXiv Detail & Related papers (2024-03-29T05:23:12Z) - Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning.
Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation.
Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z) - Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data.
One key challenge in federated learning is to handle non-identically distributed data across the clients.
We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z) - When Parameter-efficient Tuning Meets General-purpose Vision-language
Models [65.19127815275307]
PETAL revolutionizes the training process by requiring only 0.5% of the total parameters, achieved through a unique mode approximation technique.
Our experiments reveal that PETAL not only outperforms current state-of-the-art methods in most scenarios but also surpasses full fine-tuning models in effectiveness.
arXiv Detail & Related papers (2023-12-16T17:13:08Z) - Multi-View Class Incremental Learning [57.14644913531313]
Multi-view learning (MVL) has gained great success in integrating information from multiple perspectives of a dataset to improve downstream task performance.
This paper investigates a novel paradigm called multi-view class incremental learning (MVCIL), where a single model incrementally classifies new classes from a continual stream of views.
arXiv Detail & Related papers (2023-06-16T08:13:41Z) - Learning an evolved mixture model for task-free continual learning [11.540150938141034]
We address the Task-Free Continual Learning (TFCL) in which a model is trained on non-stationary data streams with no explicit task information.
We introduce two simple dropout mechanisms to selectively remove stored examples in order to avoid memory overload.
arXiv Detail & Related papers (2022-07-11T16:01:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.