Make Continual Learning Stronger via C-Flat
- URL: http://arxiv.org/abs/2404.00986v1
- Date: Mon, 1 Apr 2024 08:18:38 GMT
- Title: Make Continual Learning Stronger via C-Flat
- Authors: Ang Bian, Wei Li, Hangjie Yuan, Chengrong Yu, Zixiang Zhao, Mang Wang, Aojun Lu, Tao Feng,
- Abstract summary: We propose a Continual Flatness (C-Flat) method featuring a flatter loss landscape tailored for Continual Learning (CL)
C-Flat could be easily called with only one line of code and is plug-and-play to any CL methods.
- Score: 13.042434803115707
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model generalization ability upon incrementally acquiring dynamically updating knowledge from sequentially arriving tasks is crucial to tackle the sensitivity-stability dilemma in Continual Learning (CL). Weight loss landscape sharpness minimization seeking for flat minima lying in neighborhoods with uniform low loss or smooth gradient is proven to be a strong training regime improving model generalization compared with loss minimization based optimizer like SGD. Yet only a few works have discussed this training regime for CL, proving that dedicated designed zeroth-order sharpness optimizer can improve CL performance. In this work, we propose a Continual Flatness (C-Flat) method featuring a flatter loss landscape tailored for CL. C-Flat could be easily called with only one line of code and is plug-and-play to any CL methods. A general framework of C-Flat applied to all CL categories and a thorough comparison with loss minima optimizer and flat minima based CL approaches is presented in this paper, showing that our method can boost CL performance in almost all cases. Code will be publicly available upon publication.
Related papers
- CLoRA: Parameter-Efficient Continual Learning with Low-Rank Adaptation [14.2843647693986]
Low-Rank Adaptation (LoRA) is a parameter-efficient fine-tuning method for class-incremental semantic segmentation.<n>CLoRA significantly reduces the hardware requirements for training, making it well-suited for CL in resource-constrained environments after deployment.
arXiv Detail & Related papers (2025-07-26T09:36:05Z) - CLA: Latent Alignment for Online Continual Self-Supervised Learning [53.52783900926569]
We introduce Continual Latent Alignment (CLA), a novel SSL strategy for Online CL.<n>Our CLA is able to speed up the convergence of the training process in the online scenario, outperforming state-of-the-art approaches under the same computational budget.<n>We also discovered that using CLA as a pretraining protocol in the early stages of pretraining leads to a better final performance when compared to a full i.i.d. pretraining.
arXiv Detail & Related papers (2025-07-14T16:23:39Z) - Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation [19.749490092520006]
Self-Calibrated CLIP (SC-CLIP) is a training-free method that calibrates CLIP to produce finer-language representations.
SC-CLIP boosts the performance of vanilla CLIP ViT-L/14 by 6.8 times.
arXiv Detail & Related papers (2024-11-24T15:14:05Z) - Is Less More? Exploring Token Condensation as Training-free Adaptation for CLIP [43.09801987385207]
Contrastive language-image pre-training (CLIP) has shown remarkable generalization ability in image classification.
CLIP sometimes encounters performance drops on downstream datasets during zero-shot inference.
This raises an important question: Is there a training-free approach that can efficiently address CLIP's performance drop in such cases?
arXiv Detail & Related papers (2024-10-16T07:13:35Z) - CLIP's Visual Embedding Projector is a Few-shot Cornucopia [45.93202559299953]
We introduce an alternative way for few-shot CLIP adaptation without adding ''external'' parameters to optimize.
We find that simply fine-tuning the embedding projection matrix of the vision leads to better performance than all baselines.
This simple approach, coined ProLIP, yields state-of-the-art performance on 11 few-shot classification benchmarks, few-shot cross-dataset encoder transfer, domain generalization, and base-to-new class generalization.
arXiv Detail & Related papers (2024-10-07T17:59:59Z) - ICL-TSVD: Bridging Theory and Practice in Continual Learning with Pre-trained Models [103.45785408116146]
Continual learning (CL) aims to train a model that can solve multiple tasks presented sequentially.
Recent CL approaches have achieved strong performance by leveraging large pre-trained models that generalize well to downstream tasks.
However, such methods lack theoretical guarantees, making them prone to unexpected failures.
We bridge this gap by integrating an empirically strong approach into a principled framework, designed to prevent forgetting.
arXiv Detail & Related papers (2024-10-01T12:58:37Z) - SLCA++: Unleash the Power of Sequential Fine-tuning for Continual Learning with Pre-training [68.7896349660824]
We present an in-depth analysis of the progressive overfitting problem from the lens of Seq FT.
Considering that the overly fast representation learning and the biased classification layer constitute this particular problem, we introduce the advanced Slow Learner with Alignment (S++) framework.
Our approach involves a Slow Learner to selectively reduce the learning rate of backbone parameters, and a Alignment to align the disjoint classification layers in a post-hoc fashion.
arXiv Detail & Related papers (2024-08-15T17:50:07Z) - CLAP4CLIP: Continual Learning with Probabilistic Finetuning for Vision-Language Models [23.398619576886375]
Continual learning (CL) aims to help deep neural networks learn new knowledge while retaining what has been learned.
Our work proposes Continual LeArning with Probabilistic finetuning (CLAP) - a probabilistic modeling framework over visual-guided text features per task.
arXiv Detail & Related papers (2024-03-28T04:15:58Z) - A Hard-to-Beat Baseline for Training-free CLIP-based Adaptation [121.0693322732454]
Contrastive Language-Image Pretraining (CLIP) has gained popularity for its remarkable zero-shot capacity.
Recent research has focused on developing efficient fine-tuning methods to enhance CLIP's performance in downstream tasks.
We revisit a classical algorithm, Gaussian Discriminant Analysis (GDA), and apply it to the downstream classification of CLIP.
arXiv Detail & Related papers (2024-02-06T15:45:27Z) - In-context Learning and Gradient Descent Revisited [3.085927389171139]
We show that even untrained models achieve comparable ICL-GD similarity scores despite not exhibiting ICL.
Next, we explore a major discrepancy in the flow of information throughout the model between ICL and GD, which we term Layer Causality.
We propose a simple GD-based optimization procedure that respects layer causality, and show it improves similarity scores significantly.
arXiv Detail & Related papers (2023-11-13T21:42:38Z) - Which Features are Learnt by Contrastive Learning? On the Role of
Simplicity Bias in Class Collapse and Feature Suppression [59.97965005675144]
Contrastive learning (CL) has emerged as a powerful technique for representation learning, with or without label supervision.
We provide the first unified theoretically rigorous framework to determine textitwhich features are learnt by CL.
We present increasing embedding dimensionality and improving the quality of data augmentations as two theoretically motivated solutions.
arXiv Detail & Related papers (2023-05-25T23:37:22Z) - Does Continual Learning Equally Forget All Parameters? [55.431048995662714]
Distribution shift (e.g., task or domain shift) in continual learning (CL) usually results in catastrophic forgetting of neural networks.
We study which modules in neural networks are more prone to forgetting by investigating their training dynamics during CL.
We propose a more efficient and simpler method that entirely removes the every-step replay and replaces them by only $k$-times of FPF periodically triggered during CL.
arXiv Detail & Related papers (2023-04-09T04:36:24Z) - CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1
Accuracy with ViT-B and ViT-L on ImageNet [139.56863124214905]
We find that fine-tuning performance of CLIP is substantially underestimated.
Specifically, CLIP ViT-Base/16 and CLIP ViT-Large/14 can achieve 85.7%,88.0% finetuning Top-1 accuracy on the ImageNet-1K dataset.
arXiv Detail & Related papers (2022-12-12T18:59:59Z) - Do Pre-trained Models Benefit Equally in Continual Learning? [25.959813589169176]
Existing work on continual learning (CL) is primarily devoted to developing algorithms for models trained from scratch.
Despite their encouraging performance on contrived benchmarks, these algorithms show dramatic performance drops in real-world scenarios.
This paper advocates the systematic introduction of pre-training to CL.
arXiv Detail & Related papers (2022-10-27T18:03:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.