Related papers: Diagnosing Catastrophe: Large parts of accuracy loss in continual learning can be accounted for by readout misalignment

Diagnosing Catastrophe: Large parts of accuracy loss in continual learning can be accounted for by readout misalignment

URL: http://arxiv.org/abs/2310.05644v1
Date: Mon, 9 Oct 2023 11:57:46 GMT
Title: Diagnosing Catastrophe: Large parts of accuracy loss in continual learning can be accounted for by readout misalignment
Authors: Daniel Anthes and Sushrut Thorat and Peter K\"onig and Tim C. Kietzmann
Abstract summary: Training artificial neural networks on changing data distributions leads to a rapid decrease in performance on old tasks. We investigate the representational changes that underlie this performance decrease and identify three distinct processes that together account for the phenomenon.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Unlike primates, training artificial neural networks on changing data distributions leads to a rapid decrease in performance on old tasks. This phenomenon is commonly referred to as catastrophic forgetting. In this paper, we investigate the representational changes that underlie this performance decrease and identify three distinct processes that together account for the phenomenon. The largest component is a misalignment between hidden representations and readout layers. Misalignment occurs due to learning on additional tasks and causes internal representations to shift. Representational geometry is partially conserved under this misalignment and only a small part of the information is irrecoverably lost. All types of representational changes scale with the dimensionality of hidden representations. These insights have implications for deep learning applications that need to be continuously updated, but may also aid aligning ANN models to the rather robust biological vision.

Related papers

Dynamical loss functions shape landscape topography and improve learning in artificial neural networks [0.9208007322096533]
We show how to transform cross-entropy and mean squared error into dynamical loss functions. We show how they significantly improve validation accuracy for networks of varying sizes.
arXiv Detail & Related papers (2024-10-14T16:27:03Z)
SINDER: Repairing the Singular Defects of DINOv2 [61.98878352956125]
Vision Transformer models trained on large-scale datasets often exhibit artifacts in the patch token they extract. We propose a novel fine-tuning smooth regularization that rectifies structural deficiencies using only a small dataset.
arXiv Detail & Related papers (2024-07-23T20:34:23Z)
Mitigating the Effect of Incidental Correlations on Part-based Learning [50.682498099720114]
Part-based representations could be more interpretable and generalize better with limited data. We present two innovative regularization methods for part-based representations. We exhibit state-of-the-art (SoTA) performance on few-shot learning tasks on benchmark datasets.
arXiv Detail & Related papers (2023-09-30T13:44:48Z)
Unsupervised Learning of Invariance Transformations [105.54048699217668]
We develop an algorithmic framework for finding approximate graph automorphisms. We discuss how this framework can be used to find approximate automorphisms in weighted graphs in general.
arXiv Detail & Related papers (2023-07-24T17:03:28Z)
Memorization-Dilation: Modeling Neural Collapse Under Label Noise [10.134749691813344]
During the terminal phase of training a deep neural network, the feature embedding of all examples of the same class tend to collapse to a single representation. Empirical evidence suggests that the memorization of noisy data points leads to a degradation (dilation) of the neural collapse. Our proofs reveal why label smoothing, a modification of cross-entropy empirically observed to produce a regularization effect, leads to improved generalization in classification tasks.
arXiv Detail & Related papers (2022-06-11T13:40:37Z)
Improving Transferability of Representations via Augmentation-Aware Self-Supervision [117.15012005163322]
AugSelf is an auxiliary self-supervised loss that learns the difference of augmentation parameters between two randomly augmented samples. Our intuition is that AugSelf encourages to preserve augmentation-aware information in learned representations, which could be beneficial for their transferability. AugSelf can easily be incorporated into recent state-of-the-art representation learning methods with a negligible additional training cost.
arXiv Detail & Related papers (2021-11-18T10:43:50Z)
Reducing Representation Drift in Online Continual Learning [87.71558506591937]
We study the online continual learning paradigm, where agents must learn from a changing distribution with constrained memory and compute. In this work we instead focus on the change in representations of previously observed data due to the introduction of previously unobserved class samples in the incoming data stream.
arXiv Detail & Related papers (2021-04-11T15:19:30Z)
Analyzing Overfitting under Class Imbalance in Neural Networks for Image Segmentation [19.259574003403998]
In image segmentation neural networks may overfit to the foreground samples from small structures. In this study, we provide new insights on the problem of overfitting under class imbalance by inspecting the network behavior.
arXiv Detail & Related papers (2021-02-20T14:57:58Z)
Essentials for Class Incremental Learning [43.306374557919646]
Class-incremental learning results on CIFAR-100 and ImageNet improve over the state-of-the-art by a large margin, while keeping the approach simple.
arXiv Detail & Related papers (2021-02-18T18:01:06Z)
Evaluating Robustness to Context-Sensitive Feature Perturbations of Different Granularities [9.102162930376386]
We introduce a new method that identifies context-sensitive feature perturbations to the inputs of image classifiers. We produce these changes by performing small adjustments to the activation values of different layers of a trained generative neural network. Unsurprisingly, we find that state-of-the-art classifiers are not robust to any such changes.
arXiv Detail & Related papers (2020-01-29T19:20:01Z)
Learning and Memorizing Representative Prototypes for 3D Point Cloud Semantic and Instance Segmentation [117.29799759864127]
3D point cloud semantic and instance segmentation is crucial and fundamental for 3D scene understanding. Deep networks can easily forget the non-dominant cases during the learning process, resulting in unsatisfactory performance. We propose a memory-augmented network to learn and memorize the representative prototypes that cover diverse samples universally.
arXiv Detail & Related papers (2020-01-06T01:07:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.