Preventing Zero-Shot Transfer Degradation in Continual Learning of
Vision-Language Models
- URL: http://arxiv.org/abs/2303.06628v2
- Date: Fri, 11 Aug 2023 15:56:32 GMT
- Title: Preventing Zero-Shot Transfer Degradation in Continual Learning of
Vision-Language Models
- Authors: Zangwei Zheng, Mingyuan Ma, Kai Wang, Ziheng Qin, Xiangyu Yue, Yang
You
- Abstract summary: We propose a novel method to prevent zero-shot transfer degradation in the continual learning of vision-language models.
Our method outperforms other methods in the traditional class-incremental learning setting.
- Score: 13.340759455910721
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Continual learning (CL) can help pre-trained vision-language models
efficiently adapt to new or under-trained data distributions without
re-training. Nevertheless, during the continual training of the Contrastive
Language-Image Pre-training (CLIP) model, we observe that the model's zero-shot
transfer ability significantly degrades due to catastrophic forgetting.
Existing CL methods can mitigate forgetting by replaying previous data.
However, since the CLIP dataset is private, replay methods cannot access the
pre-training dataset. In addition, replaying data of previously learned
downstream tasks can enhance their performance but comes at the cost of
sacrificing zero-shot performance. To address this challenge, we propose a
novel method ZSCL to prevent zero-shot transfer degradation in the continual
learning of vision-language models in both feature and parameter space. In the
feature space, a reference dataset is introduced for distillation between the
current and initial models. The reference dataset should have semantic
diversity but no need to be labeled, seen in pre-training, or matched
image-text pairs. In parameter space, we prevent a large parameter shift by
averaging weights during the training. We propose a more challenging
Multi-domain Task Incremental Learning (MTIL) benchmark to evaluate different
methods, where tasks are from various domains instead of class-separated in a
single dataset. Our method outperforms other methods in the traditional
class-incremental learning setting and the MTIL by 9.7% average score. Our code
locates at https://github.com/Thunderbeee/ZSCL.
Related papers
- Diffusion-Driven Data Replay: A Novel Approach to Combat Forgetting in Federated Class Continual Learning [13.836798036474143]
Key challenge in Federated Class Continual Learning is catastrophic forgetting.
We propose a novel method of data replay based on diffusion models.
Our method significantly outperforms existing baselines.
arXiv Detail & Related papers (2024-09-02T10:07:24Z) - Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification [34.37262622415682]
We propose a new adaptation framework called Data Adaptive Traceback.
Specifically, we utilize a zero-shot-based method to extract the most downstream task-related subset of the pre-training data.
We adopt a pseudo-label-based semi-supervised technique to reuse the pre-training images and a vision-language contrastive learning method to address the confirmation bias issue in semi-supervised learning.
arXiv Detail & Related papers (2024-07-11T18:01:58Z) - Enhancing Consistency and Mitigating Bias: A Data Replay Approach for
Incremental Learning [100.7407460674153]
Deep learning systems are prone to catastrophic forgetting when learning from a sequence of tasks.
To mitigate the problem, a line of methods propose to replay the data of experienced tasks when learning new tasks.
However, it is not expected in practice considering the memory constraint or data privacy issue.
As a replacement, data-free data replay methods are proposed by inverting samples from the classification model.
arXiv Detail & Related papers (2024-01-12T12:51:12Z) - DST-Det: Simple Dynamic Self-Training for Open-Vocabulary Object Detection [72.25697820290502]
This work introduces a straightforward and efficient strategy to identify potential novel classes through zero-shot classification.
We refer to this approach as the self-training strategy, which enhances recall and accuracy for novel classes without requiring extra annotations, datasets, and re-training.
Empirical evaluations on three datasets, including LVIS, V3Det, and COCO, demonstrate significant improvements over the baseline performance.
arXiv Detail & Related papers (2023-10-02T17:52:24Z) - RanPAC: Random Projections and Pre-trained Models for Continual Learning [59.07316955610658]
Continual learning (CL) aims to learn different tasks (such as classification) in a non-stationary data stream without forgetting old ones.
We propose a concise and effective approach for CL with pre-trained models.
arXiv Detail & Related papers (2023-07-05T12:49:02Z) - Don't Memorize; Mimic The Past: Federated Class Incremental Learning
Without Episodic Memory [36.4406505365313]
This paper presents a framework for federated class incremental learning that utilizes a generative model to synthesize samples from past distributions instead of storing part of past data.
The generative model is trained on the server using data-free methods at the end of each task without requesting data from clients.
arXiv Detail & Related papers (2023-07-02T07:06:45Z) - Prototype-Sample Relation Distillation: Towards Replay-Free Continual
Learning [14.462797749666992]
We propose a holistic approach to jointly learn the representation and class prototypes.
We propose a novel distillation loss that constrains class prototypes to maintain relative similarities as compared to new task data.
This method yields state-of-the-art performance in the task-incremental setting.
arXiv Detail & Related papers (2023-03-26T16:35:45Z) - Self-Distillation for Further Pre-training of Transformers [83.84227016847096]
We propose self-distillation as a regularization for a further pre-training stage.
We empirically validate the efficacy of self-distillation on a variety of benchmark datasets for image and text classification tasks.
arXiv Detail & Related papers (2022-09-30T02:25:12Z) - Knowledge Distillation as Efficient Pre-training: Faster Convergence,
Higher Data-efficiency, and Better Transferability [53.27240222619834]
Knowledge Distillation as Efficient Pre-training aims to efficiently transfer the learned feature representation from pre-trained models to new student models for future downstream tasks.
Our method performs comparably with supervised pre-training counterparts in 3 downstream tasks and 9 downstream datasets requiring 10x less data and 5x less pre-training time.
arXiv Detail & Related papers (2022-03-10T06:23:41Z) - Self-Supervised Pre-Training for Transformer-Based Person
Re-Identification [54.55281692768765]
Transformer-based supervised pre-training achieves great performance in person re-identification (ReID)
Due to the domain gap between ImageNet and ReID datasets, it usually needs a larger pre-training dataset to boost the performance.
This work aims to mitigate the gap between the pre-training and ReID datasets from the perspective of data and model structure.
arXiv Detail & Related papers (2021-11-23T18:59:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.