Related papers: Beyond Sharpness: A Flatness Decomposition Framework for Efficient Continual Learning

Beyond Sharpness: A Flatness Decomposition Framework for Efficient Continual Learning

URL: http://arxiv.org/abs/2601.07636v1
Date: Mon, 12 Jan 2026 15:17:04 GMT
Title: Beyond Sharpness: A Flatness Decomposition Framework for Efficient Continual Learning
Authors: Yanan Chen, Tieliang Gong, Yunjiao Zhang, Wen Wen,
Abstract summary: Continual Learning aims to enable models to sequentially learn multiple tasks without forgetting previous knowledge.<n>Existing sharpness-aware methods for Continual Learning suffer from two key limitations.<n>We propose FLAD, a novel optimization framework that decomposes perturbations into sharpness-aligned and gradient-noise components.
Score: 27.583428955764774
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Continual Learning (CL) aims to enable models to sequentially learn multiple tasks without forgetting previous knowledge. Recent studies have shown that optimizing towards flatter loss minima can improve model generalization. However, existing sharpness-aware methods for CL suffer from two key limitations: (1) they treat sharpness regularization as a unified signal without distinguishing the contributions of its components. and (2) they introduce substantial computational overhead that impedes practical deployment. To address these challenges, we propose FLAD, a novel optimization framework that decomposes sharpness-aware perturbations into gradient-aligned and stochastic-noise components, and show that retaining only the noise component promotes generalization. We further introduce a lightweight scheduling scheme that enables FLAD to maintain significant performance gains even under constrained training time. FLAD can be seamlessly integrated into various CL paradigms and consistently outperforms standard and sharpness-aware optimizers in diverse experimental settings, demonstrating its effectiveness and practicality in CL.

Related papers

Breaking the Limits of Open-Weight CLIP: An Optimization Framework for Self-supervised Fine-tuning of CLIP [60.025820738301434]
TuneCLIP is a self-supervised fine-tuning framework for CLIP models.<n>It consistently improves performance across model architectures and scales.<n>It elevates leading open-weight models like SigLIP (ViT-B/16), achieving gains of up to +2.5% on ImageNet and related out-of-distribution benchmarks.
arXiv Detail & Related papers (2026-01-14T20:38:36Z)
AnaCP: Toward Upper-Bound Continual Learning via Analytic Contrastive Projection [11.750791465488438]
This paper studies the problem of class-incremental learning (CIL)<n>Traditional CIL methods, which do not leverage pre-trained models (PTMs), suffer from catastrophic forgetting (CF)<n>We propose AnaCP, a novel method that preserves the efficiency of analytic classifiers while enabling incremental feature adaptation without gradient-based training.
arXiv Detail & Related papers (2025-11-17T19:56:15Z)
C-Flat++: Towards a More Efficient and Powerful Framework for Continual Learning [26.486835539215523]
We propose textbfContinual textbfFlatness (textbfC-Flat), a method that promotes flatter loss landscapes tailored for continual learning.<n>C-Flat offers plug-and-play compatibility, enabling easy integration with minimal modifications to the code pipeline.<n>In addition, we introduce C-Flat++, an efficient yet effective framework that leverages selective flatness-driven promotion.
arXiv Detail & Related papers (2025-08-26T09:39:09Z)
LoRanPAC: Low-rank Random Features and Pre-trained Models for Bridging Theory and Practice in Continual Learning [103.45785408116146]
Continual learning (CL) aims to train a model that can solve multiple tasks presented sequentially.<n>Recent CL approaches have achieved strong performance by leveraging large pre-trained models that generalize well to downstream tasks.<n>However, such methods lack theoretical guarantees, making them prone to unexpected failures.<n>We aim to bridge this gap by designing a simple CL method that is theoretically sound and highly performant.
arXiv Detail & Related papers (2024-10-01T12:58:37Z)
Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization. A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR. For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z)
FeTT: Continual Class Incremental Learning via Feature Transformation Tuning [19.765229703131876]
Continual learning (CL) aims to extend deep models from static and enclosed environments to dynamic and complex scenarios. Recent CL models have gradually shifted towards the utilization of pre-trained models with parameter-efficient fine-tuning strategies. This paper proposes feature transformation tuning (FeTT) model to non-parametrically fine-tune backbone features across all tasks.
arXiv Detail & Related papers (2024-05-20T06:33:50Z)
Make Continual Learning Stronger via C-Flat [12.569738684003923]
We propose a Continual Flatness (C-Flat) method featuring a flatter loss landscape tailored for Continual Learning (CL) C-Flat could be easily called with only one line of code and is plug-and-play to any CL methods.
arXiv Detail & Related papers (2024-04-01T08:18:38Z)
Gradient constrained sharpness-aware prompt learning for vision-language models [99.74832984957025]
This paper targets a novel trade-off problem in generalizable prompt learning for vision-language models (VLM) By analyzing the loss landscapes of the state-of-the-art method and vanilla Sharpness-aware Minimization (SAM) based method, we conclude that the trade-off performance correlates to both loss value and loss sharpness. We propose a novel SAM-based method for prompt learning, denoted as Gradient Constrained Sharpness-aware Context Optimization (GCSCoOp)
arXiv Detail & Related papers (2023-09-14T17:13:54Z)
Continual Learners are Incremental Model Generalizers [70.34479702177988]
This paper extensively studies the impact of Continual Learning (CL) models as pre-trainers. We find that the transfer quality of the representation often increases gradually without noticeable degradation in fine-tuning performance. We propose a new fine-tuning scheme, GLobal Attention Discretization (GLAD), that preserves rich task-generic representation during solving downstream tasks.
arXiv Detail & Related papers (2023-06-21T05:26:28Z)
When Does Contrastive Learning Preserve Adversarial Robustness from Pretraining to Finetuning? [99.4914671654374]
We propose AdvCL, a novel adversarial contrastive pretraining framework. We show that AdvCL is able to enhance cross-task robustness transferability without loss of model accuracy and finetuning efficiency.
arXiv Detail & Related papers (2021-11-01T17:59:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.