Reawakening knowledge: Anticipatory recovery from catastrophic interference via structured training
- URL: http://arxiv.org/abs/2403.09613v1
- Date: Thu, 14 Mar 2024 17:51:54 GMT
- Title: Reawakening knowledge: Anticipatory recovery from catastrophic interference via structured training
- Authors: Yanlai Yang, Matt Jones, Michael C. Mozer, Mengye Ren,
- Abstract summary: We explore the training dynamics of neural networks in a structured non-IID setting where documents are presented cyclically in a fixed, repeated sequence.
We discover a curious and remarkable property of LLMs fine-tuned sequentially in this setting: they exhibit anticipatory behavior, recovering from the forgetting on documents before encountering them again.
- Score: 24.719121340143978
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We explore the training dynamics of neural networks in a structured non-IID setting where documents are presented cyclically in a fixed, repeated sequence. Typically, networks suffer from catastrophic interference when training on a sequence of documents; however, we discover a curious and remarkable property of LLMs fine-tuned sequentially in this setting: they exhibit anticipatory behavior, recovering from the forgetting on documents before encountering them again. The behavior emerges and becomes more robust as the architecture scales up its number of parameters. Through comprehensive experiments and visualizations, we uncover new insights into training over-parameterized networks in structured environments.
Related papers
- Learning Object-Centric Representation via Reverse Hierarchy Guidance [73.05170419085796]
Object-Centric Learning (OCL) seeks to enable Neural Networks to identify individual objects in visual scenes.
RHGNet introduces a top-down pathway that works in different ways in the training and inference processes.
Our model achieves SOTA performance on several commonly used datasets.
arXiv Detail & Related papers (2024-05-17T07:48:27Z) - Understanding and Leveraging the Learning Phases of Neural Networks [7.1169582271841625]
The learning dynamics of deep neural networks are not well understood.
We comprehensively analyze the learning dynamics by investigating a layer's reconstruction ability of the input and prediction performance.
We show the existence of three phases using common datasets and architectures such as ResNet and VGG.
arXiv Detail & Related papers (2023-12-11T23:20:58Z) - Deconstructing Data Reconstruction: Multiclass, Weight Decay and General
Losses [28.203535970330343]
Haim et al. (2022) proposed a scheme to reconstruct training samples from multilayer perceptron binary classifiers.
We extend their findings in several directions, including reconstruction from multiclass and convolutional neural networks.
We study the various factors that contribute to networks' susceptibility to such reconstruction schemes.
arXiv Detail & Related papers (2023-07-04T17:09:49Z) - Critical Learning Periods for Multisensory Integration in Deep Networks [112.40005682521638]
We show that the ability of a neural network to integrate information from diverse sources hinges critically on being exposed to properly correlated signals during the early phases of training.
We show that critical periods arise from the complex and unstable early transient dynamics, which are decisive of final performance of the trained system and their learned representations.
arXiv Detail & Related papers (2022-10-06T23:50:38Z) - Learning Fast and Slow for Online Time Series Forecasting [76.50127663309604]
Fast and Slow learning Networks (FSNet) is a holistic framework for online time-series forecasting.
FSNet balances fast adaptation to recent changes and retrieving similar old knowledge.
Our code will be made publicly available.
arXiv Detail & Related papers (2022-02-23T18:23:07Z) - The learning phases in NN: From Fitting the Majority to Fitting a Few [2.5991265608180396]
We analyze a layer's reconstruction ability of the input and prediction performance based on the evolution of parameters during training.
We also assess the behavior using common datasets and architectures from computer vision such as ResNet and VGG.
arXiv Detail & Related papers (2022-02-16T19:11:42Z) - PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive
Learning [109.84770951839289]
We present PredRNN, a new recurrent network for learning visual dynamics from historical context.
We show that our approach obtains highly competitive results on three standard datasets.
arXiv Detail & Related papers (2021-03-17T08:28:30Z) - Understanding Self-supervised Learning with Dual Deep Networks [74.92916579635336]
We propose a novel framework to understand contrastive self-supervised learning (SSL) methods that employ dual pairs of deep ReLU networks.
We prove that in each SGD update of SimCLR with various loss functions, the weights at each layer are updated by a emphcovariance operator.
To further study what role the covariance operator plays and which features are learned in such a process, we model data generation and augmentation processes through a emphhierarchical latent tree model (HLTM)
arXiv Detail & Related papers (2020-10-01T17:51:49Z) - Supporting Optimal Phase Space Reconstructions Using Neural Network
Architecture for Time Series Modeling [68.8204255655161]
We propose an artificial neural network with a mechanism to implicitly learn the phase spaces properties.
Our approach is either as competitive as or better than most state-of-the-art strategies.
arXiv Detail & Related papers (2020-06-19T21:04:47Z) - Detecting structural perturbations from time series with deep learning [0.0]
We present a graph neural network approach to infer structural perturbations from functional time series.
We show our data-driven approach outperforms typical reconstruction methods.
This work uncovers a practical avenue to study the resilience of real-world complex systems.
arXiv Detail & Related papers (2020-06-09T13:08:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.