Rethinking Continual Learning with Progressive Neural Collapse
- URL: http://arxiv.org/abs/2505.24254v1
- Date: Fri, 30 May 2025 06:21:04 GMT
- Title: Rethinking Continual Learning with Progressive Neural Collapse
- Authors: Zheng Wang, Wanhao Yu, Li Yang, Sen Lin,
- Abstract summary: Continual Learning (CL) seeks to build an agent that can continuously learn a sequence of tasks, where a key challenge, namely Catastrophic Forgetting, persists.<n>Deep neural networks (DNNs) are shown to converge to a terminal state termed Neural Collapse during training, where all class prototypes geometrically form a static simplex equiangular tight frame (ETF)<n>We propose Progressive Neural Collapse (ProNC), a novel framework that completely removes the need of a fixed global ETF in CL.
- Score: 18.616537615728102
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Continual Learning (CL) seeks to build an agent that can continuously learn a sequence of tasks, where a key challenge, namely Catastrophic Forgetting, persists due to the potential knowledge interference among different tasks. On the other hand, deep neural networks (DNNs) are shown to converge to a terminal state termed Neural Collapse during training, where all class prototypes geometrically form a static simplex equiangular tight frame (ETF). These maximally and equally separated class prototypes make the ETF an ideal target for model learning in CL to mitigate knowledge interference. Thus inspired, several studies have emerged very recently to leverage a fixed global ETF in CL, which however suffers from key drawbacks, such as impracticability and limited performance.To address these challenges and fully unlock the potential of ETF in CL, we propose Progressive Neural Collapse (ProNC), a novel framework that completely removes the need of a fixed global ETF in CL. Specifically, ProNC progressively expands the ETF target in a principled way by adding new class prototypes as vertices for new tasks, ensuring maximal separability across all encountered classes with minimal shifts from the previous ETF. We next develop a new CL framework by plugging ProNC into commonly used CL algorithm designs, where distillation is further leveraged to balance between target shifting for old classes and target aligning for new classes. Extensive experiments show that our approach significantly outperforms related baselines while maintaining superior flexibility, simplicity, and efficiency.
Related papers
- Continual Learning of Achieving Forgetting-free and Positive Knowledge Transfer [12.245360561698503]
An ideal continual learning agent should not only be able to overcome catastrophic forgetting (CF) but also encourage positive forward and backward knowledge transfer (KT)<n>This paper first models CL as an optimization problem in which each sequential learning task aims to achieve its optimal performance under the constraint that both FKT and BKT should be positive.<n>It then proposes a novel Enhanced Task Continual Learning (ETCL) method, which achieves forgetting-free and positive KT.
arXiv Detail & Related papers (2026-01-09T08:27:14Z) - Scalable Class-Incremental Learning Based on Parametric Neural Collapse [5.550140856551579]
Incremental learning often encounter challenges such as overfitting to new data and catastrophic forgetting of old data.<n>We propose scalable class-incremental learning based on parametric neural collapse.
arXiv Detail & Related papers (2025-12-26T03:34:59Z) - BOFA: Bridge-Layer Orthogonal Low-Rank Fusion for CLIP-Based Class-Incremental Learning [84.56022893225422]
Class-Incremental Learning (CIL) aims to continually learn new categories without forgetting previously acquired knowledge.<n>Applying vision-language models such as CLIP to CIL poses two major challenges: (1) adapting to downstream tasks often requires additional learnable modules, increasing model complexity and susceptibility to forgetting; and (2) while multi-modal representations offer complementary strengths, existing methods have yet to fully realize their potential in effectively integrating visual and textual modalities.
arXiv Detail & Related papers (2025-11-14T15:51:40Z) - Position: Continual Learning Benefits from An Evolving Population over An Unified Model [4.348086726793516]
This study introduces a novel Population-based Continual Learning (PCL) framework.<n>PCL extends Continual Learning to the architectural level by maintaining and evolving a population of neural network architectures.<n>PCL outperforms state-of-the-art rehearsal-free CL methods that employs a unified model.
arXiv Detail & Related papers (2025-02-10T07:21:44Z) - Leveraging Intermediate Neural Collapse with Simplex ETFs for Efficient Deep Neural Networks [0.0]
We show that constraining the final layer of a neural network to a simplex ETF can reduce the number of trainable parameters without sacrificing model accuracy.<n>We propose two novel training approaches: Adaptive-ETF, a generalized framework that enforces simplex ETF constraints on all layers beyond the effective depth, and ETF-Transformer, which applies simplex ETF constraints to the feedforward layers within transformer blocks.
arXiv Detail & Related papers (2024-12-01T16:44:55Z) - LoRanPAC: Low-rank Random Features and Pre-trained Models for Bridging Theory and Practice in Continual Learning [103.45785408116146]
Continual learning (CL) aims to train a model that can solve multiple tasks presented sequentially.<n>Recent CL approaches have achieved strong performance by leveraging large pre-trained models that generalize well to downstream tasks.<n>However, such methods lack theoretical guarantees, making them prone to unexpected failures.<n>We aim to bridge this gap by designing a simple CL method that is theoretically sound and highly performant.
arXiv Detail & Related papers (2024-10-01T12:58:37Z) - ECLIPSE: Efficient Continual Learning in Panoptic Segmentation with Visual Prompt Tuning [54.68180752416519]
Panoptic segmentation is a cutting-edge computer vision task.
We introduce a novel and efficient method for continual panoptic segmentation based on Visual Prompt Tuning, dubbed ECLIPSE.
Our approach involves freezing the base model parameters and fine-tuning only a small set of prompt embeddings, addressing both catastrophic forgetting and plasticity.
arXiv Detail & Related papers (2024-03-29T11:31:12Z) - Stragglers-Aware Low-Latency Synchronous Federated Learning via Layer-Wise Model Updates [71.81037644563217]
Synchronous federated learning (FL) is a popular paradigm for collaborative edge learning.
As some of the devices may have limited computational resources and varying availability, FL latency is highly sensitive to stragglers.
We propose straggler-aware layer-wise federated learning (SALF) that leverages the optimization procedure of NNs via backpropagation to update the global model in a layer-wise fashion.
arXiv Detail & Related papers (2024-03-27T09:14:36Z) - Federated Continual Novel Class Learning [68.05835753892907]
We propose a Global Alignment Learning framework that can accurately estimate the global novel class number.
Gal achieves significant improvements in novel-class performance, increasing the accuracy by 5.1% to 10.6%.
Gal is shown to be effective in equipping a variety of different mainstream Federated Learning algorithms with novel class discovery and learning capability.
arXiv Detail & Related papers (2023-12-21T00:31:54Z) - Continual Learners are Incremental Model Generalizers [70.34479702177988]
This paper extensively studies the impact of Continual Learning (CL) models as pre-trainers.
We find that the transfer quality of the representation often increases gradually without noticeable degradation in fine-tuning performance.
We propose a new fine-tuning scheme, GLobal Attention Discretization (GLAD), that preserves rich task-generic representation during solving downstream tasks.
arXiv Detail & Related papers (2023-06-21T05:26:28Z) - A Neural Span-Based Continual Named Entity Recognition Model [13.982996312057207]
We propose SpanKL, a Span-based model with Knowledge distillation (KD) to preserve memories and multi-Label prediction to prevent conflicts in CL-NER.
Experiments on synthetic CL datasets derived from OntoNotes and Few-NERD show that SpanKL significantly outperforms previous SoTA in many aspects.
arXiv Detail & Related papers (2023-02-23T17:51:29Z) - Neural Collapse Inspired Feature-Classifier Alignment for Few-Shot Class
Incremental Learning [120.53458753007851]
Few-shot class-incremental learning (FSCIL) has been a challenging problem as only a few training samples are accessible for each novel class in the new sessions.
We deal with this misalignment dilemma in FSCIL inspired by the recently discovered phenomenon named neural collapse.
We propose a neural collapse inspired framework for FSCIL. Experiments on the miniImageNet, CUB-200, and CIFAR-100 datasets demonstrate that our proposed framework outperforms the state-of-the-art performances.
arXiv Detail & Related papers (2023-02-06T18:39:40Z) - Mitigating Forgetting in Online Continual Learning via Contrasting
Semantically Distinct Augmentations [22.289830907729705]
Online continual learning (OCL) aims to enable model learning from a non-stationary data stream to continuously acquire new knowledge as well as retain the learnt one.
Main challenge comes from the "catastrophic forgetting" issue -- the inability to well remember the learnt knowledge while learning the new ones.
arXiv Detail & Related papers (2022-11-10T05:29:43Z) - Online Continual Learning with Contrastive Vision Transformer [67.72251876181497]
This paper proposes a framework Contrastive Vision Transformer (CVT) to achieve a better stability-plasticity trade-off for online CL.
Specifically, we design a new external attention mechanism for online CL that implicitly captures previous tasks' information.
Based on the learnable focuses, we design a focal contrastive loss to rebalance contrastive learning between new and past classes and consolidate previously learned representations.
arXiv Detail & Related papers (2022-07-24T08:51:02Z) - Contextual Classification Using Self-Supervised Auxiliary Models for
Deep Neural Networks [6.585049648605185]
We introduce the notion of Self-Supervised Autogenous Learning (SSAL) models.
A SSAL objective is realized through one or more additional targets that are derived from the original supervised classification task.
We show that SSAL models consistently outperform the state-of-the-art while also providing structured predictions that are more interpretable.
arXiv Detail & Related papers (2021-01-07T18:41:16Z) - Prevalence of Neural Collapse during the terminal phase of deep learning
training [7.031848258307718]
Modern practice for training classification deepnets involves a Terminal Phase of Training (TPT)
During TPT, the training error stays effectively zero while training loss is pushed towards zero.
The symmetric and very simple geometry induced by the TPT confers important benefits, including better performance, better generalization, and better interpretability.
arXiv Detail & Related papers (2020-08-18T23:12:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.