Momentum-based Weight Interpolation of Strong Zero-Shot Models for
Continual Learning
- URL: http://arxiv.org/abs/2211.03186v1
- Date: Sun, 6 Nov 2022 17:41:39 GMT
- Title: Momentum-based Weight Interpolation of Strong Zero-Shot Models for
Continual Learning
- Authors: Zafir Stojanovski, Karsten Roth, Zeynep Akata
- Abstract summary: Large pre-trained, zero-shot capable models have shown considerable success both for standard transfer and adaptation tasks.
However, through naive fine-tuning, these zero-shot models lose their generalizability and robustness towards distribution shifts.
In this work, we showcase that where fine-tuning falls short to adapt such zero-shot capable models, simple momentum-based weight can provide consistent improvements.
- Score: 46.80199921638615
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large pre-trained, zero-shot capable models have shown considerable success
both for standard transfer and adaptation tasks, with particular robustness
towards distribution shifts. In addition, subsequent fine-tuning can
considerably improve performance on a selected downstream task. However,
through naive fine-tuning, these zero-shot models lose their generalizability
and robustness towards distribution shifts. This is a particular problem for
tasks such as Continual Learning (CL), where continuous adaptation has to be
performed as new task distributions are introduced sequentially. In this work,
we showcase that where fine-tuning falls short to adapt such zero-shot capable
models, simple momentum-based weight interpolation can provide consistent
improvements for CL tasks in both memory-free and memory-based settings. In
particular, we find improvements of over $+4\%$ on standard CL benchmarks,
while reducing the error to the upper limit of jointly training on all tasks at
once in parts by more than half, allowing the continual learner to inch closer
to the joint training limits.
Related papers
- ICL-TSVD: Bridging Theory and Practice in Continual Learning with Pre-trained Models [103.45785408116146]
Continual learning (CL) aims to train a model that can solve multiple tasks presented sequentially.
Recent CL approaches have achieved strong performance by leveraging large pre-trained models that generalize well to downstream tasks.
However, such methods lack theoretical guarantees, making them prone to unexpected failures.
We bridge this gap by integrating an empirically strong approach into a principled framework, designed to prevent forgetting.
arXiv Detail & Related papers (2024-10-01T12:58:37Z) - Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization.
A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR.
For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z) - FeTT: Continual Class Incremental Learning via Feature Transformation Tuning [19.765229703131876]
Continual learning (CL) aims to extend deep models from static and enclosed environments to dynamic and complex scenarios.
Recent CL models have gradually shifted towards the utilization of pre-trained models with parameter-efficient fine-tuning strategies.
This paper proposes feature transformation tuning (FeTT) model to non-parametrically fine-tune backbone features across all tasks.
arXiv Detail & Related papers (2024-05-20T06:33:50Z) - Calibrating Multi-modal Representations: A Pursuit of Group Robustness without Annotations [19.800907485589402]
Fine-tuning pre-trained vision-language models, like CLIP, has yielded success on diverse downstream tasks.
These tuned models tend to become highly specialized, limiting their practicality for real-world deployment.
We propose a lightweight representation calibration method for fine-tuning CLIP.
arXiv Detail & Related papers (2024-03-12T01:47:17Z) - Robust Fine-Tuning of Vision-Language Models for Domain Generalization [6.7181844004432385]
Foundation models have impressive zero-shot inference capabilities and robustness under distribution shifts.
We present a new recipe for few-shot fine-tuning of the popular vision-language foundation model CLIP.
Our experimentation demonstrates that, while zero-shot CLIP fails to match performance of trained vision models on more complex benchmarks, few-shot CLIP fine-tuning outperforms its vision-only counterparts.
arXiv Detail & Related papers (2023-11-03T20:50:40Z) - Continual Learners are Incremental Model Generalizers [70.34479702177988]
This paper extensively studies the impact of Continual Learning (CL) models as pre-trainers.
We find that the transfer quality of the representation often increases gradually without noticeable degradation in fine-tuning performance.
We propose a new fine-tuning scheme, GLobal Attention Discretization (GLAD), that preserves rich task-generic representation during solving downstream tasks.
arXiv Detail & Related papers (2023-06-21T05:26:28Z) - CLIPood: Generalizing CLIP to Out-of-Distributions [73.86353105017076]
Contrastive language-image pre-training (CLIP) models have shown impressive zero-shot ability, but the further adaptation of CLIP on downstream tasks undesirably degrades OOD performances.
We propose CLIPood, a fine-tuning method that can adapt CLIP models to OOD situations where both domain shifts and open classes may occur on unseen test data.
Experiments on diverse datasets with different OOD scenarios show that CLIPood consistently outperforms existing generalization techniques.
arXiv Detail & Related papers (2023-02-02T04:27:54Z) - Meta-Learned Attribute Self-Gating for Continual Generalized Zero-Shot
Learning [82.07273754143547]
We propose a meta-continual zero-shot learning (MCZSL) approach to generalizing a model to categories unseen during training.
By pairing self-gating of attributes and scaled class normalization with meta-learning based training, we are able to outperform state-of-the-art results.
arXiv Detail & Related papers (2021-02-23T18:36:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.