Related papers: A study on the plasticity of neural networks

A study on the plasticity of neural networks

URL: http://arxiv.org/abs/2106.00042v2
Date: Sat, 14 Oct 2023 11:58:06 GMT
Title: A study on the plasticity of neural networks
Authors: Tudor Berariu, Wojciech Czarnecki, Soham De, Jorg Bornschein, Samuel Smith, Razvan Pascanu and Claudia Clopath
Abstract summary: We discuss the implication of losing plasticity for continual learning. We show that a pretrained model on data from the same distribution as the one it is fine-tuned on might not reach the same generalisation as a freshly initialised model.
Score: 21.43675319928863
License: http://creativecommons.org/licenses/by/4.0/
Abstract: One aim shared by multiple settings, such as continual learning or transfer learning, is to leverage previously acquired knowledge to converge faster on the current task. Usually this is done through fine-tuning, where an implicit assumption is that the network maintains its plasticity, meaning that the performance it can reach on any given task is not affected negatively by previously seen tasks. It has been observed recently that a pretrained model on data from the same distribution as the one it is fine-tuned on might not reach the same generalisation as a freshly initialised one. We build and extend this observation, providing a hypothesis for the mechanics behind it. We discuss the implication of losing plasticity for continual learning which heavily relies on optimising pretrained models.

Related papers

How Weight Resampling and Optimizers Shape the Dynamics of Continual Learning and Forgetting in Neural Networks [2.270857464465579]
Recent work in continual learning has highlighted the beneficial effect of resampling weights in the last layer of a neural network (zapping)<n>We investigate in detail the pattern of learning and forgetting that take place inside a convolutional neural network when trained in challenging settings.
arXiv Detail & Related papers (2025-07-02T10:18:35Z)
Continual Learning in Vision-Language Models via Aligned Model Merging [84.47520899851557]
We present a new perspective based on model merging to maintain stability while still retaining plasticity.<n>To maximize the effectiveness of the merging process, we propose a simple mechanism that promotes learning aligned weights with previous ones.
arXiv Detail & Related papers (2025-05-30T20:52:21Z)
Spurious Forgetting in Continual Learning of Language Models [20.0936011355535]
Recent advancements in large language models (LLMs) reveal a perplexing phenomenon in continual learning. Despite extensive training, models experience significant performance declines. This study proposes that such performance drops often reflect a decline in task alignment rather than true knowledge loss.
arXiv Detail & Related papers (2025-01-23T08:09:54Z)
Why pre-training is beneficial for downstream classification tasks? [32.331679393303446]
We propose to quantitatively and explicitly explain effects of pre-training on the downstream task from a novel game-theoretic view. Specifically, we extract and quantify the knowledge encoded by the pre-trained model. We discover that only a small amount of pre-trained model's knowledge is preserved for the inference of downstream tasks.
arXiv Detail & Related papers (2024-10-11T02:13:16Z)
Disentangling the Causes of Plasticity Loss in Neural Networks [55.23250269007988]
We show that loss of plasticity can be decomposed into multiple independent mechanisms. We show that a combination of layer normalization and weight decay is highly effective at maintaining plasticity in a variety of synthetic nonstationary learning tasks.
arXiv Detail & Related papers (2024-02-29T00:02:33Z)
IF2Net: Innately Forgetting-Free Networks for Continual Learning [49.57495829364827]
Continual learning can incrementally absorb new concepts without interfering with previously learned knowledge. Motivated by the characteristics of neural networks, we investigated how to design an Innately Forgetting-Free Network (IF2Net) IF2Net allows a single network to inherently learn unlimited mapping rules without telling task identities at test time.
arXiv Detail & Related papers (2023-06-18T05:26:49Z)
Theoretical Characterization of How Neural Network Pruning Affects its Generalization [131.1347309639727]
This work makes the first attempt to study how different pruning fractions affect the model's gradient descent dynamics and generalization. It is shown that as long as the pruning fraction is below a certain threshold, gradient descent can drive the training loss toward zero. More surprisingly, the generalization bound gets better as the pruning fraction gets larger.
arXiv Detail & Related papers (2023-01-01T03:10:45Z)
Probing Representation Forgetting in Supervised and Unsupervised Continual Learning [14.462797749666992]
Catastrophic forgetting is associated with an abrupt loss of knowledge previously learned by a model. We show that representation forgetting can lead to new insights on the effect of model capacity and loss function used in continual learning.
arXiv Detail & Related papers (2022-03-24T23:06:08Z)
An Empirical Investigation of the Role of Pre-training in Lifelong Learning [21.995593026269578]
We show that generic pre-training implicitly alleviates the effects of catastrophic forgetting when learning multiple tasks sequentially. We study this phenomenon by analyzing the loss landscape, finding that pre-trained weights appear to ease forgetting by leading to wider minima.
arXiv Detail & Related papers (2021-12-16T19:00:55Z)
Learning Curves for Sequential Training of Neural Networks: Self-Knowledge Transfer and Forgetting [9.734033555407406]
We consider neural networks in the neural tangent kernel regime that continually learn target functions from task to task. We investigate a variant of continual learning where the model learns the same target function in multiple tasks. Even for the same target, the trained model shows some transfer and forgetting depending on the sample size of each task.
arXiv Detail & Related papers (2021-12-03T00:25:01Z)
Reducing Representation Drift in Online Continual Learning [87.71558506591937]
We study the online continual learning paradigm, where agents must learn from a changing distribution with constrained memory and compute. In this work we instead focus on the change in representations of previously observed data due to the introduction of previously unobserved class samples in the incoming data stream.
arXiv Detail & Related papers (2021-04-11T15:19:30Z)
Unsupervised Transfer Learning for Spatiotemporal Predictive Networks [90.67309545798224]
We study how to transfer knowledge from a zoo of unsupervisedly learned models towards another network. Our motivation is that models are expected to understand complex dynamics from different sources. Our approach yields significant improvements on three benchmarks fortemporal prediction, and benefits the target even from less relevant ones.
arXiv Detail & Related papers (2020-09-24T15:40:55Z)
Learning from Failure: Training Debiased Classifier from Biased Classifier [76.52804102765931]
We show that neural networks learn to rely on spurious correlation only when it is "easier" to learn than the desired knowledge. We propose a failure-based debiasing scheme by training a pair of neural networks simultaneously. Our method significantly improves the training of the network against various types of biases in both synthetic and real-world datasets.
arXiv Detail & Related papers (2020-07-06T07:20:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.