CLIP with Generative Latent Replay: a Strong Baseline for Incremental Learning
- URL: http://arxiv.org/abs/2407.15793v1
- Date: Mon, 22 Jul 2024 16:51:28 GMT
- Title: CLIP with Generative Latent Replay: a Strong Baseline for Incremental Learning
- Authors: Emanuele Frascaroli, Aniello Panariello, Pietro Buzzega, Lorenzo Bonicelli, Angelo Porrello, Simone Calderara,
- Abstract summary: We propose Continual Generative training for Incremental prompt-Learning, a novel approach to mitigate forgetting while adapting a VLM.
We demonstrate the effectiveness of our framework in adapting to new tasks while improving zero-shot capabilities.
- Score: 17.614980614656407
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the emergence of Transformers and Vision-Language Models (VLMs) such as CLIP, large pre-trained models have become a common strategy to enhance performance in Continual Learning scenarios. This led to the development of numerous prompting strategies to effectively fine-tune transformer-based models without succumbing to catastrophic forgetting. However, these methods struggle to specialize the model on domains significantly deviating from the pre-training and preserving its zero-shot capabilities. In this work, we propose Continual Generative training for Incremental prompt-Learning, a novel approach to mitigate forgetting while adapting a VLM, which exploits generative replay to align prompts to tasks. We also introduce a new metric to evaluate zero-shot capabilities within CL benchmarks. Through extensive experiments on different domains, we demonstrate the effectiveness of our framework in adapting to new tasks while improving zero-shot capabilities. Further analysis reveals that our approach can bridge the gap with joint prompt tuning. The codebase is available at https://github.com/aimagelab/mammoth.
Related papers
- Mind the Gap: Preserving and Compensating for the Modality Gap in CLIP-Based Continual Learning [11.50324946279326]
Contrastive Language-Image Pre-trained model (CLIP) exhibiting strong capabilities across various downstream tasks.<n>We analyze the variations in the modality gap during the fine-tuning of vision-language pre-trained models.<n>We propose a simple yet effective method, MG-CLIP, that improves CLIP's performance in class-incremental learning.
arXiv Detail & Related papers (2025-07-12T02:28:42Z) - Orthogonal Projection Subspace to Aggregate Online Prior-knowledge for Continual Test-time Adaptation [67.80294336559574]
Continual Test Time Adaptation (CTTA) is a task that requires a source pre-trained model to continually adapt to new scenarios.<n>We propose a novel pipeline, Orthogonal Projection Subspace to aggregate online Prior-knowledge, dubbed OoPk.
arXiv Detail & Related papers (2025-06-23T18:17:39Z) - Retaining and Enhancing Pre-trained Knowledge in Vision-Language Models with Prompt Ensembling [5.6987175375687995]
We introduce a novel prompt ensemble learning approach called Group-wise Prompt Ensemble (GPE)
Our method aims to enhance CLIP's zero-shot capabilities by incorporating new domain knowledge while improving its robustness against data distribution shifts.
Our approach hinges on three main strategies: prompt grouping with masked attention to optimize CLIP's adaptability while safeguarding its zero-shot capabilities; the incorporation of auxiliary prompts for the seamless integration of new domain insights without disrupting the original model's representation; and an ensemble learning strategy that effectively merges original and new knowledge.
arXiv Detail & Related papers (2024-12-10T00:40:31Z) - Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization.
A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR.
For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z) - Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters [65.15700861265432]
We present a parameter-efficient continual learning framework to alleviate long-term forgetting in incremental learning with vision-language models.
Our approach involves the dynamic expansion of a pre-trained CLIP model, through the integration of Mixture-of-Experts (MoE) adapters.
To preserve the zero-shot recognition capability of vision-language models, we introduce a Distribution Discriminative Auto-Selector.
arXiv Detail & Related papers (2024-03-18T08:00:23Z) - Transformers for Supervised Online Continual Learning [11.270594318662233]
We propose a method that leverages transformers' in-context learning capabilities for online continual learning.
Our method demonstrates significant improvements over previous state-of-the-art results on CLOC, a challenging large-scale real-world benchmark for image geo-localization.
arXiv Detail & Related papers (2024-03-03T16:12:20Z) - Read Between the Layers: Leveraging Multi-Layer Representations for Rehearsal-Free Continual Learning with Pre-Trained Models [15.847302755988506]
We address the Continual Learning problem, wherein a model must learn a sequence of tasks from non-stationary distributions.
We propose LayUP, a new prototype-based approach to CL that leverages second-order feature statistics from multiple intermediate layers of a pre-trained network.
Our results demonstrate that fully exhausting the representational capacities of pre-trained models in CL goes well beyond their final embeddings.
arXiv Detail & Related papers (2023-12-13T13:11:44Z) - Class Incremental Learning with Pre-trained Vision-Language Models [59.15538370859431]
We propose an approach to exploiting pre-trained vision-language models (e.g. CLIP) that enables further adaptation.
Experiments on several conventional benchmarks consistently show a significant margin of improvement over the current state-of-the-art.
arXiv Detail & Related papers (2023-10-31T10:45:03Z) - TACTiS-2: Better, Faster, Simpler Attentional Copulas for Multivariate Time Series [57.4208255711412]
Building on copula theory, we propose a simplified objective for the recently-introduced transformer-based attentional copulas (TACTiS)
We show that the resulting model has significantly better training dynamics and achieves state-of-the-art performance across diverse real-world forecasting tasks.
arXiv Detail & Related papers (2023-10-02T16:45:19Z) - RanPAC: Random Projections and Pre-trained Models for Continual Learning [59.07316955610658]
Continual learning (CL) aims to learn different tasks (such as classification) in a non-stationary data stream without forgetting old ones.
We propose a concise and effective approach for CL with pre-trained models.
arXiv Detail & Related papers (2023-07-05T12:49:02Z) - CLIPood: Generalizing CLIP to Out-of-Distributions [73.86353105017076]
Contrastive language-image pre-training (CLIP) models have shown impressive zero-shot ability, but the further adaptation of CLIP on downstream tasks undesirably degrades OOD performances.
We propose CLIPood, a fine-tuning method that can adapt CLIP models to OOD situations where both domain shifts and open classes may occur on unseen test data.
Experiments on diverse datasets with different OOD scenarios show that CLIPood consistently outperforms existing generalization techniques.
arXiv Detail & Related papers (2023-02-02T04:27:54Z) - An EM Framework for Online Incremental Learning of Semantic Segmentation [37.94734474090863]
We propose an incremental learning strategy that can adapt deep segmentation models without catastrophic forgetting, using a streaming input data with pixel annotations on the novel classes only.
We validate our approach on the PASCAL VOC 2012 and ADE20K datasets, and the results demonstrate its superior performance over the existing incremental methods.
arXiv Detail & Related papers (2021-08-08T11:30:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.