Related papers: Reawakening knowledge: Anticipatory recovery from catastrophic interference via structured training

Reawakening knowledge: Anticipatory recovery from catastrophic interference via structured training

URL: http://arxiv.org/abs/2403.09613v2
Date: Sun, 24 Nov 2024 03:37:38 GMT
Title: Reawakening knowledge: Anticipatory recovery from catastrophic interference via structured training
Authors: Yanlai Yang, Matt Jones, Michael C. Mozer, Mengye Ren,
Abstract summary: We explore the training dynamics of neural networks in a structured non-IID setting where documents are presented cyclically in a fixed, repeated sequence. We find that over-parametrized neural networks can recover from catastrophic interference.
Score: 24.719121340143978
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We explore the training dynamics of neural networks in a structured non-IID setting where documents are presented cyclically in a fixed, repeated sequence. Typically, networks suffer from catastrophic interference when training on a sequence of documents; however, we discover a curious and remarkable property of LLMs finetuned sequentially in this setting: they exhibit anticipatory behavior, recovering from the forgetting on documents before encountering them again. This behavior occurs even though the documents are never presented in context together. The behavior emerges and becomes more robust as the architecture scales up its number of parameters. Through comprehensive experiments and visualizations, we demonstrate a new mechanism by which over-parametrized neural networks can recover from catastrophic interference and uncover new insights into training over-parameterized networks in cyclically structured environments.

Related papers

New Evidence of the Two-Phase Learning Dynamics of Neural Networks [59.55028392232715]
We introduce an interval-wise perspective that compares network states across a time window.<n>We show that the response of the network to a perturbation exhibits a transition from chaotic to stable.<n>We also find that after this transition point the model's functional trajectory is confined to a narrow cone-shaped subset.
arXiv Detail & Related papers (2025-05-20T04:03:52Z)
Continually Learning Structured Visual Representations via Network Refinement with Rerelation [15.376349115976534]
Current machine learning paradigm relies on continuous representations like neural networks, which iteratively adjust parameters to approximate outcomes. We propose a method that learns visual space in a structured, continual manner.
arXiv Detail & Related papers (2025-02-19T18:18:27Z)
Discovering Chunks in Neural Embeddings for Interpretability [53.80157905839065]
We propose leveraging the principle of chunking to interpret artificial neural population activities. We first demonstrate this concept in recurrent neural networks (RNNs) trained on artificial sequences with imposed regularities. We identify similar recurring embedding states corresponding to concepts in the input, with perturbations to these states activating or inhibiting the associated concepts.
arXiv Detail & Related papers (2025-02-03T20:30:46Z)
Disentangling the Causes of Plasticity Loss in Neural Networks [55.23250269007988]
We show that loss of plasticity can be decomposed into multiple independent mechanisms. We show that a combination of layer normalization and weight decay is highly effective at maintaining plasticity in a variety of synthetic nonstationary learning tasks.
arXiv Detail & Related papers (2024-02-29T00:02:33Z)
Leveraging Low-Rank and Sparse Recurrent Connectivity for Robust Closed-Loop Control [63.310780486820796]
We show how a parameterization of recurrent connectivity influences robustness in closed-loop settings. We find that closed-form continuous-time neural networks (CfCs) with fewer parameters can outperform their full-rank, fully-connected counterparts.
arXiv Detail & Related papers (2023-10-05T21:44:18Z)
Deconstructing Data Reconstruction: Multiclass, Weight Decay and General Losses [28.203535970330343]
Haim et al. (2022) proposed a scheme to reconstruct training samples from multilayer perceptron binary classifiers. We extend their findings in several directions, including reconstruction from multiclass and convolutional neural networks. We study the various factors that contribute to networks' susceptibility to such reconstruction schemes.
arXiv Detail & Related papers (2023-07-04T17:09:49Z)
Critical Learning Periods for Multisensory Integration in Deep Networks [112.40005682521638]
We show that the ability of a neural network to integrate information from diverse sources hinges critically on being exposed to properly correlated signals during the early phases of training. We show that critical periods arise from the complex and unstable early transient dynamics, which are decisive of final performance of the trained system and their learned representations.
arXiv Detail & Related papers (2022-10-06T23:50:38Z)
Classification of network topology and dynamics via sequence characterization [0.1611401281366893]
We investigate whether the reconstruction of the network via the co-occurrence method is useful to recover both the network topology and agent dynamics generating sequences. We found that the characterization of reconstructed networks provides valuable information regarding the process and topology used to create the sequences.
arXiv Detail & Related papers (2022-06-30T11:05:39Z)
The learning phases in NN: From Fitting the Majority to Fitting a Few [2.5991265608180396]
We analyze a layer's reconstruction ability of the input and prediction performance based on the evolution of parameters during training. We also assess the behavior using common datasets and architectures from computer vision such as ResNet and VGG.
arXiv Detail & Related papers (2022-02-16T19:11:42Z)
Explainable Adversarial Attacks in Deep Neural Networks Using Activation Profiles [69.9674326582747]
This paper presents a visual framework to investigate neural network models subjected to adversarial examples. We show how observing these elements can quickly pinpoint exploited areas in a model.
arXiv Detail & Related papers (2021-03-18T13:04:21Z)
PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive Learning [109.84770951839289]
We present PredRNN, a new recurrent network for learning visual dynamics from historical context. We show that our approach obtains highly competitive results on three standard datasets.
arXiv Detail & Related papers (2021-03-17T08:28:30Z)
Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task. This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z)
Detecting structural perturbations from time series with deep learning [0.0]
We present a graph neural network approach to infer structural perturbations from functional time series. We show our data-driven approach outperforms typical reconstruction methods. This work uncovers a practical avenue to study the resilience of real-world complex systems.
arXiv Detail & Related papers (2020-06-09T13:08:40Z)
Online Continual Learning on Sequences [9.603184477806954]
Online continual learning (OCL) refers to the ability of a system to learn over time from a continuous stream of data without having to revisit previously encountered training samples. Machine learning models that address OCL must alleviate textitcatastrophic forgetting in which hidden representations are disrupted or completely overwritten when learning from streams of novel input.
arXiv Detail & Related papers (2020-03-20T05:49:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.