Related papers: Why Do Neural Networks Forget: A Study of Collapse in Continual Learning

Why Do Neural Networks Forget: A Study of Collapse in Continual Learning

URL: http://arxiv.org/abs/2603.04580v1
Date: Wed, 04 Mar 2026 20:19:00 GMT
Title: Why Do Neural Networks Forget: A Study of Collapse in Continual Learning
Authors: Yunqin Zhu, Jun Jin,
Abstract summary: Catastrophic forgetting is a major problem in continual learning, and lots of approaches arise to reduce it.<n>Recent research suggests that structural collapse leads to loss of plasticity, as evidenced by changes in effective rank (eRank)<n>In this study, we investigate the correlation between forgetting and collapse through the measurement of both weight and activation eRank.
Score: 1.9345014784026022
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Catastrophic forgetting is a major problem in continual learning, and lots of approaches arise to reduce it. However, most of them are evaluated through task accuracy, which ignores the internal model structure. Recent research suggests that structural collapse leads to loss of plasticity, as evidenced by changes in effective rank (eRank). This indicates a link to forgetting, since the networks lose the ability to expand their feature space to learn new tasks, which forces the network to overwrite existing representations. Therefore, in this study, we investigate the correlation between forgetting and collapse through the measurement of both weight and activation eRank. To be more specific, we evaluated four architectures, including MLP, ConvGRU, ResNet-18, and Bi-ConvGRU, in the split MNIST and Split CIFAR-100 benchmarks. Those models are trained through the SGD, Learning-without-Forgetting (LwF), and Experience Replay (ER) strategies separately. The results demonstrate that forgetting and collapse are strongly related, and different continual learning strategies help models preserve both capacity and performance in different efficiency.

Related papers

Catastrophic Forgetting in Kolmogorov-Arnold Networks [27.683054983159835]
Catastrophic forgetting is a longstanding challenge in continual learning.<n>Recent architectural advances like Kolmogorov-Arnold Networks (KANs) have been suggested to offer intrinsic resistance to forgetting.<n>We present a comprehensive study of catastrophic forgetting in KANs and develop a theoretical framework that links forgetting to activation support overlap and intrinsic data dimension.
arXiv Detail & Related papers (2025-11-16T23:22:50Z)
Rethinking Hebbian Principle: Low-Dimensional Structural Projection for Unsupervised Learning [17.299267108673277]
Hebbian learning is a biological principle that intuitively describes how neurons adapt their connections through repeated stimuli.<n>We introduce the Structural Projection Hebbian Representation (SPHeRe), a novel unsupervised learning method.<n> Experimental results show that SPHeRe achieves SOTA performance among unsupervised synaptic plasticity approaches.
arXiv Detail & Related papers (2025-10-16T15:47:29Z)
The Importance of Being Lazy: Scaling Limits of Continual Learning [60.97756735877614]
We show that increasing model width is only beneficial when it reduces the amount of feature learning, yielding more laziness.<n>We study the intricate relationship between feature learning, task non-stationarity, and forgetting, finding that high feature learning is only beneficial with highly similar tasks.
arXiv Detail & Related papers (2025-06-20T10:12:38Z)
The Other Side of the Coin: Unveiling the Downsides of Model Aggregation in Federated Learning from a Layer-peeled Perspective [12.916988821333124]
In federated learning (FL), model aggregation is a critical step by which multiple clients share their knowledge with one another.<n>This temporary performance drop can potentially slow down the convergence of the FL model.<n>We propose several simple yet effective strategies to mitigate the negative impacts of model aggregation.
arXiv Detail & Related papers (2025-02-05T14:45:56Z)
Towards Robust Out-of-Distribution Generalization: Data Augmentation and Neural Architecture Search Approaches [4.577842191730992]
We study ways toward robust OoD generalization for deep learning. We first propose a novel and effective approach to disentangle the spurious correlation between features that are not essential for recognition. We then study the problem of strengthening neural architecture search in OoD scenarios.
arXiv Detail & Related papers (2024-10-25T20:50:32Z)
Order parameters and phase transitions of continual learning in deep neural networks [6.349503549199403]
Continual learning (CL) enables animals to learn new tasks without erasing prior knowledge.<n> CL in artificial neural networks (NNs) is challenging due to catastrophic forgetting, where new learning degrades performance on older tasks.<n>We present a statistical-mechanics theory of CL in deep, wide NNs, which characterizes the network's input-output mapping as it learns a sequence of tasks.
arXiv Detail & Related papers (2024-07-14T20:22:36Z)
Lightweight Diffusion Models with Distillation-Based Block Neural Architecture Search [55.41583104734349]
We propose to automatically remove structural redundancy in diffusion models with our proposed Diffusion Distillation-based Block-wise Neural Architecture Search (NAS) Given a larger pretrained teacher, we leverage DiffNAS to search for the smallest architecture which can achieve on-par or even better performance than the teacher. Different from previous block-wise NAS methods, DiffNAS contains a block-wise local search strategy and a retraining strategy with a joint dynamic loss.
arXiv Detail & Related papers (2023-11-08T12:56:59Z)
Class-Incremental Learning: A Survey [84.30083092434938]
Class-Incremental Learning (CIL) enables the learner to incorporate the knowledge of new classes incrementally. CIL tends to catastrophically forget the characteristics of former ones, and its performance drastically degrades. We provide a rigorous and unified evaluation of 17 methods in benchmark image classification tasks to find out the characteristics of different algorithms.
arXiv Detail & Related papers (2023-02-07T17:59:05Z)
Improved Convergence Guarantees for Shallow Neural Networks [91.3755431537592]
We prove convergence of depth 2 neural networks, trained via gradient descent, to a global minimum. Our model has the following features: regression with quadratic loss function, fully connected feedforward architecture, RelU activations, Gaussian data instances, adversarial labels. They strongly suggest that, at least in our model, the convergence phenomenon extends well beyond the NTK regime''
arXiv Detail & Related papers (2022-12-05T14:47:52Z)
FOSTER: Feature Boosting and Compression for Class-Incremental Learning [52.603520403933985]
Deep neural networks suffer from catastrophic forgetting when learning new categories. We propose a novel two-stage learning paradigm FOSTER, empowering the model to learn new categories adaptively.
arXiv Detail & Related papers (2022-04-10T11:38:33Z)
Understanding Self-supervised Learning with Dual Deep Networks [74.92916579635336]
We propose a novel framework to understand contrastive self-supervised learning (SSL) methods that employ dual pairs of deep ReLU networks. We prove that in each SGD update of SimCLR with various loss functions, the weights at each layer are updated by a emphcovariance operator. To further study what role the covariance operator plays and which features are learned in such a process, we model data generation and augmentation processes through a emphhierarchical latent tree model (HLTM)
arXiv Detail & Related papers (2020-10-01T17:51:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.