Related papers: Eidetic Learning: an Efficient and Provable Solution to Catastrophic Forgetting

Eidetic Learning: an Efficient and Provable Solution to Catastrophic Forgetting

URL: http://arxiv.org/abs/2502.09500v2
Date: Fri, 14 Feb 2025 14:39:22 GMT
Title: Eidetic Learning: an Efficient and Provable Solution to Catastrophic Forgetting
Authors: Nicholas Dronen, Randall Balestriero,
Abstract summary: We present a method, Eidetic Learning, that provably solves catastrophic forgetting.<n>A network trained with Eidetic Learning -- here, an EideticNet -- requires no rehearsal or replay.<n>An EideticNet is easy to implement and train, is efficient, and has time and space complexity linear in the number of parameters.
Score: 13.412573082645096
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Catastrophic forgetting -- the phenomenon of a neural network learning a task t1 and losing the ability to perform it after being trained on some other task t2 -- is a long-standing problem for neural networks [McCloskey and Cohen, 1989]. We present a method, Eidetic Learning, that provably solves catastrophic forgetting. A network trained with Eidetic Learning -- here, an EideticNet -- requires no rehearsal or replay. We consider successive discrete tasks and show how at inference time an EideticNet automatically routes new instances without auxiliary task information. An EideticNet bears a family resemblance to the sparsely-gated Mixture-of-Experts layer Shazeer et al. [2016] in that network capacity is partitioned across tasks and the network itself performs data-conditional routing. An EideticNet is easy to implement and train, is efficient, and has time and space complexity linear in the number of parameters. The guarantee of our method holds for normalization layers of modern neural networks during both pre-training and fine-tuning. We show with a variety of network architectures and sets of tasks that EideticNets are immune to forgetting. While the practical benefits of EideticNets are substantial, we believe they can be benefit practitioners and theorists alike. The code for training EideticNets is available at https://github.com/amazon-science/eideticnet-training.

Related papers

Negotiated Representations to Prevent Forgetting in Machine Learning Applications [0.0]
Catastrophic forgetting is a significant challenge in the field of machine learning. We propose a novel method for preventing catastrophic forgetting in machine learning applications.
arXiv Detail & Related papers (2023-11-30T22:43:50Z)
Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks [69.38572074372392]
We present the first results proving that feature learning occurs during training with a nonlinear model on multiple tasks. Our key insight is that multi-task pretraining induces a pseudo-contrastive loss that favors representations that align points that typically have the same label across tasks.
arXiv Detail & Related papers (2023-07-13T16:39:08Z)
Diffused Redundancy in Pre-trained Representations [98.55546694886819]
We take a closer look at how features are encoded in pre-trained representations. We find that learned representations in a given layer exhibit a degree of diffuse redundancy. Our findings shed light on the nature of representations learned by pre-trained deep neural networks.
arXiv Detail & Related papers (2023-05-31T21:00:50Z)
On the Soft-Subnetwork for Few-shot Class Incremental Learning [67.0373924836107]
We propose a few-shot class incremental learning (FSCIL) method referred to as emphSoft-SubNetworks (SoftNet). Our objective is to learn a sequence of sessions incrementally, where each session only includes a few training instances per class while preserving the knowledge of the previously learned ones. We provide comprehensive empirical validations demonstrating that our SoftNet effectively tackles the few-shot incremental learning problem by surpassing the performance of state-of-the-art baselines over benchmark datasets.
arXiv Detail & Related papers (2022-09-15T04:54:02Z)
How much pre-training is enough to discover a good subnetwork? [10.699603774240853]
We mathematically analyze the amount of dense network pre-training needed for a pruned network to perform well. We find a simple theoretical bound in the number of gradient descent pre-training iterations on a two-layer, fully-connected network. Experiments with larger datasets require more pre-training forworks obtained via pruning to perform well.
arXiv Detail & Related papers (2021-07-31T15:08:36Z)
FreeTickets: Accurate, Robust and Efficient Deep Ensemble by Training with Dynamic Sparsity [74.58777701536668]
We introduce the FreeTickets concept, which can boost the performance of sparse convolutional neural networks over their dense network equivalents by a large margin. We propose two novel efficient ensemble methods with dynamic sparsity, which yield in one shot many diverse and accurate tickets "for free" during the sparse training process.
arXiv Detail & Related papers (2021-06-28T10:48:20Z)
Learning Neural Network Subspaces [74.44457651546728]
Recent observations have advanced our understanding of the neural network optimization landscape. With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks. With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
arXiv Detail & Related papers (2021-02-20T23:26:58Z)
Truly Sparse Neural Networks at Scale [2.2860412844991655]
We train the largest neural network ever trained in terms of representational power -- reaching the bat brain size. Our approach has state-of-the-art performance while opening the path for an environmentally friendly artificial intelligence era.
arXiv Detail & Related papers (2021-02-02T20:06:47Z)
It's Hard for Neural Networks To Learn the Game of Life [4.061135251278187]
Recent findings suggest that neural networks rely on lucky random initial weights of "lottery tickets" that converge quickly to a solution. We examine small convolutional networks that are trained to predict n steps of the two-dimensional cellular automaton Conway's Game of Life. We find that networks of this architecture trained on this task rarely converge.
arXiv Detail & Related papers (2020-09-03T00:47:08Z)
HALO: Learning to Prune Neural Networks with Shrinkage [5.283963846188862]
Deep neural networks achieve state-of-the-art performance in a variety of tasks by extracting a rich set of features from unstructured data. Modern techniques for inducing sparsity and reducing model size are (1) network pruning, (2) training with a sparsity inducing penalty, and (3) training a binary mask jointly with the weights of the network. We present a novel penalty called Hierarchical Adaptive Lasso which learns to adaptively sparsify weights of a given network via trainable parameters.
arXiv Detail & Related papers (2020-08-24T04:08:48Z)
Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks [95.51368472949308]
Adaptation can be useful in cases when training data is scarce, or when one wishes to encode priors in the network. In this paper, we propose a straightforward alternative: side-tuning.
arXiv Detail & Related papers (2019-12-31T18:52:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.