C-Flat++: Towards a More Efficient and Powerful Framework for Continual Learning
- URL: http://arxiv.org/abs/2508.18860v2
- Date: Fri, 29 Aug 2025 11:36:39 GMT
- Title: C-Flat++: Towards a More Efficient and Powerful Framework for Continual Learning
- Authors: Wei Li, Hangjie Yuan, Zixiang Zhao, Yifan Zhu, Aojun Lu, Tao Feng, Yanan Sun,
- Abstract summary: We propose textbfContinual textbfFlatness (textbfC-Flat), a method that promotes flatter loss landscapes tailored for continual learning.<n>C-Flat offers plug-and-play compatibility, enabling easy integration with minimal modifications to the code pipeline.<n>In addition, we introduce C-Flat++, an efficient yet effective framework that leverages selective flatness-driven promotion.
- Score: 26.486835539215523
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Balancing sensitivity to new tasks and stability for retaining past knowledge is crucial in continual learning (CL). Recently, sharpness-aware minimization has proven effective in transfer learning and has also been adopted in continual learning (CL) to improve memory retention and learning efficiency. However, relying on zeroth-order sharpness alone may favor sharper minima over flatter ones in certain settings, leading to less robust and potentially suboptimal solutions. In this paper, we propose \textbf{C}ontinual \textbf{Flat}ness (\textbf{C-Flat}), a method that promotes flatter loss landscapes tailored for CL. C-Flat offers plug-and-play compatibility, enabling easy integration with minimal modifications to the code pipeline. Besides, we present a general framework that integrates C-Flat into all major CL paradigms and conduct comprehensive comparisons with loss-minima optimizers and flat-minima-based CL methods. Our results show that C-Flat consistently improves performance across a wide range of settings. In addition, we introduce C-Flat++, an efficient yet effective framework that leverages selective flatness-driven promotion, significantly reducing the update cost required by C-Flat. Extensive experiments across multiple CL methods, datasets, and scenarios demonstrate the effectiveness and efficiency of our proposed approaches. Code is available at https://github.com/WanNaa/C-Flat.
Related papers
- Beyond Sharpness: A Flatness Decomposition Framework for Efficient Continual Learning [27.583428955764774]
Continual Learning aims to enable models to sequentially learn multiple tasks without forgetting previous knowledge.<n>Existing sharpness-aware methods for Continual Learning suffer from two key limitations.<n>We propose FLAD, a novel optimization framework that decomposes perturbations into sharpness-aligned and gradient-noise components.
arXiv Detail & Related papers (2026-01-12T15:17:04Z) - More Than Memory Savings: Zeroth-Order Optimization Mitigates Forgetting in Continual Learning [10.698225972251839]
Zeroth-order (ZO) optimization has gained attention as a memory-efficient alternative to first-order (FO) methods.<n>We show that ZO optimization naturally leads to flatter loss landscapes, which in turn reduce forgetting in continuous learning.<n>This stability comes at a cost of plasticity: due to its imprecise gradient estimates and slower convergence, ZO optimization tends to be less effective than FO in acquiring new task-specific knowledge.<n>We propose ZO-FC, a simple but effective approach that applies ZO optimization to a single adapter-based PEFT module with FO optimized classifier.
arXiv Detail & Related papers (2025-10-23T21:54:00Z) - AmorLIP: Efficient Language-Image Pretraining via Amortization [52.533088120633785]
Contrastive Language-Image Pretraining (CLIP) has demonstrated strong zero-shot performance across diverse downstream text-image tasks.<n>We propose AmorLIP, an efficient CLIP pretraining framework that amortizes expensive computations involved in contrastive learning through lightweight neural networks.
arXiv Detail & Related papers (2025-05-25T05:30:37Z) - In-context Continual Learning Assisted by an External Continual Learner [19.382196203113836]
Existing continual learning (CL) methods rely on fine-tuning or adapting large language models (LLMs)<n>We introduce InCA, a novel approach that integrates an external continual learner (ECL) with ICL to enable scalable CL without CF.
arXiv Detail & Related papers (2024-12-20T04:44:41Z) - On the Convergence of Continual Federated Learning Using Incrementally Aggregated Gradients [2.2530496464901106]
The holy grail of machine learning is to enable Continual Federated Learning (CFL) to enhance the efficiency, privacy, and scalability of AI systems while learning from streaming data.<n>We propose a novel replay-memory based federated strategy consisting of edge-based gradient updates on memory and aggregated gradients on the current data.<n>We empirically show that C-FLAG outperforms several state-of-the-art baselines on both task and class-incremental settings with respect to metrics such as accuracy and forgetting.
arXiv Detail & Related papers (2024-11-12T17:36:20Z) - LoRanPAC: Low-rank Random Features and Pre-trained Models for Bridging Theory and Practice in Continual Learning [103.45785408116146]
Continual learning (CL) aims to train a model that can solve multiple tasks presented sequentially.<n>Recent CL approaches have achieved strong performance by leveraging large pre-trained models that generalize well to downstream tasks.<n>However, such methods lack theoretical guarantees, making them prone to unexpected failures.<n>We aim to bridge this gap by designing a simple CL method that is theoretically sound and highly performant.
arXiv Detail & Related papers (2024-10-01T12:58:37Z) - Continual Learning on a Diet: Learning from Sparsely Labeled Streams Under Constrained Computation [123.4883806344334]
We study a realistic Continual Learning setting where learning algorithms are granted a restricted computational budget per time step while training.
We apply this setting to large-scale semi-supervised Continual Learning scenarios with sparse label rates.
Our extensive analysis and ablations demonstrate that DietCL is stable under a full spectrum of label sparsity, computational budget, and various other ablations.
arXiv Detail & Related papers (2024-04-19T10:10:39Z) - Make Continual Learning Stronger via C-Flat [12.569738684003923]
We propose a Continual Flatness (C-Flat) method featuring a flatter loss landscape tailored for Continual Learning (CL)
C-Flat could be easily called with only one line of code and is plug-and-play to any CL methods.
arXiv Detail & Related papers (2024-04-01T08:18:38Z) - ECLIPSE: Efficient Continual Learning in Panoptic Segmentation with Visual Prompt Tuning [54.68180752416519]
Panoptic segmentation is a cutting-edge computer vision task.
We introduce a novel and efficient method for continual panoptic segmentation based on Visual Prompt Tuning, dubbed ECLIPSE.
Our approach involves freezing the base model parameters and fine-tuning only a small set of prompt embeddings, addressing both catastrophic forgetting and plasticity.
arXiv Detail & Related papers (2024-03-29T11:31:12Z) - Does Continual Learning Equally Forget All Parameters? [55.431048995662714]
Distribution shift (e.g., task or domain shift) in continual learning (CL) usually results in catastrophic forgetting of neural networks.
We study which modules in neural networks are more prone to forgetting by investigating their training dynamics during CL.
We propose a more efficient and simpler method that entirely removes the every-step replay and replaces them by only $k$-times of FPF periodically triggered during CL.
arXiv Detail & Related papers (2023-04-09T04:36:24Z) - When Does Contrastive Learning Preserve Adversarial Robustness from
Pretraining to Finetuning? [99.4914671654374]
We propose AdvCL, a novel adversarial contrastive pretraining framework.
We show that AdvCL is able to enhance cross-task robustness transferability without loss of model accuracy and finetuning efficiency.
arXiv Detail & Related papers (2021-11-01T17:59:43Z) - Decoupled Contrastive Learning [23.25775900388382]
We identify a noticeable negative-positive-coupling (NPC) effect in the widely used cross-entropy (InfoNCE) loss.
By properly addressing the NPC effect, we reach a decoupled contrastive learning (DCL) objective function.
Our approach achieves $66.9%$ ImageNet top-1 accuracy using batch size 256 within 200 epochs pre-training, outperforming its baseline SimCLR by $5.1%$.
arXiv Detail & Related papers (2021-10-13T16:38:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.