Related papers: Dynamically Learning to Integrate in Recurrent Neural Networks

Dynamically Learning to Integrate in Recurrent Neural Networks

URL: http://arxiv.org/abs/2503.18754v1
Date: Mon, 24 Mar 2025 15:03:23 GMT
Title: Dynamically Learning to Integrate in Recurrent Neural Networks
Authors: Blake Bordelon, Jordan Cotler, Cengiz Pehlevan, Jacob A. Zavatone-Veth,
Abstract summary: Learning to remember over long timescales is challenging for recurrent neural networks (RNNs)<n>We build a mathematical theory of the learning dynamics of linear RNNs trained to integrate white noise.
Score: 35.911170144151825
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Learning to remember over long timescales is fundamentally challenging for recurrent neural networks (RNNs). While much prior work has explored why RNNs struggle to learn long timescales and how to mitigate this, we still lack a clear understanding of the dynamics involved when RNNs learn long timescales via gradient descent. Here we build a mathematical theory of the learning dynamics of linear RNNs trained to integrate white noise. We show that when the initial recurrent weights are small, the dynamics of learning are described by a low-dimensional system that tracks a single outlier eigenvalue of the recurrent weights. This reveals the precise manner in which the long timescale associated with white noise integration is learned. We extend our analyses to RNNs learning a damped oscillatory filter, and find rich dynamical equations for the evolution of a conjugate pair of outlier eigenvalues. Taken together, our analyses build a rich mathematical framework for studying dynamical learning problems salient for both machine learning and neuroscience.

Related papers

Reactivation: Empirical NTK Dynamics Under Task Shifts [12.32540447018987]
The Neural Tangent Kernel (NTK) offers a powerful tool to study the functional dynamics of neural networks.<n>In the so-called lazy, or kernel regime, the NTK remains static during training and the network function is linear in the static neural tangents feature space.<n>The study of the NTK dynamics has led to several critical discoveries in recent years, in generalization and scaling behaviours.
arXiv Detail & Related papers (2025-07-21T20:13:02Z)
How Graph Neural Networks Learn: Lessons from Training Dynamics [80.41778059014393]
We study the training dynamics in function space of graph neural networks (GNNs) We find that the gradient descent optimization of GNNs implicitly leverages the graph structure to update the learned function. This finding offers new interpretable insights into when and why the learned GNN functions generalize.
arXiv Detail & Related papers (2023-10-08T10:19:56Z)
Low Tensor Rank Learning of Neural Dynamics [0.0]
We show that low-tensor-rank weights emerge naturally in RNNs trained to solve low-dimensional tasks. Our findings provide insight on the evolution of population connectivity over learning in both biological and artificial neural networks.
arXiv Detail & Related papers (2023-08-22T17:08:47Z)
Dynamic Analysis and an Eigen Initializer for Recurrent Neural Networks [0.0]
We study the dynamics of the hidden state in recurrent neural networks. We propose a new perspective to analyze the hidden state space based on an eigen decomposition of the weight matrix. We provide an explanation for long-term dependency based on the eigen analysis.
arXiv Detail & Related papers (2023-07-28T17:14:58Z)
On the Dynamics of Learning Time-Aware Behavior with Recurrent Neural Networks [2.294014185517203]
We introduce a family of supervised learning tasks dependent on hidden temporal variables. We train RNNs to emulate temporal flipflops that emphasize the need for time-awareness over long-term memory. We show that these RNNs learn to switch between periodic orbits that encode time modulo the period of the transition rules.
arXiv Detail & Related papers (2023-06-12T14:01:30Z)
How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series. We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z)
Dynamics-Aware Loss for Learning with Label Noise [73.75129479936302]
Label noise poses a serious threat to deep neural networks (DNNs) We propose a dynamics-aware loss (DAL) to solve this problem. Both the detailed theoretical analyses and extensive experimental results demonstrate the superiority of our method.
arXiv Detail & Related papers (2023-03-21T03:05:21Z)
Critical Learning Periods for Multisensory Integration in Deep Networks [112.40005682521638]
We show that the ability of a neural network to integrate information from diverse sources hinges critically on being exposed to properly correlated signals during the early phases of training. We show that critical periods arise from the complex and unstable early transient dynamics, which are decisive of final performance of the trained system and their learned representations.
arXiv Detail & Related papers (2022-10-06T23:50:38Z)
Reverse engineering recurrent neural networks with Jacobian switching linear dynamical systems [24.0378100479104]
Recurrent neural networks (RNNs) are powerful models for processing time-series data. The framework of reverse engineering a trained RNN by linearizing around its fixed points has provided insight, but the approach has significant challenges. We present a new model that overcomes these limitations by co-training an RNN with a novel switching linear dynamical system (SLDS) formulation.
arXiv Detail & Related papers (2021-11-01T20:49:30Z)
A self consistent theory of Gaussian Processes captures feature learning effects in finite CNNs [2.28438857884398]
Deep neural networks (DNNs) in the infinite width/channel limit have received much attention recently. Despite their theoretical appeal, this viewpoint lacks a crucial ingredient of deep learning in finite DNNs, laying at the heart of their success -- feature learning. Here we consider DNNs trained with noisy gradient descent on a large training set and derive a self consistent Gaussian Process theory accounting for strong finite-DNN and feature learning effects.
arXiv Detail & Related papers (2021-06-08T05:20:00Z)
Artificial Neural Variability for Deep Learning: On Overfitting, Noise Memorization, and Catastrophic Forgetting [135.0863818867184]
artificial neural variability (ANV) helps artificial neural networks learn some advantages from natural'' neural networks. ANV plays as an implicit regularizer of the mutual information between the training data and the learned model. It can effectively relieve overfitting, label noise memorization, and catastrophic forgetting at negligible costs.
arXiv Detail & Related papers (2020-11-12T06:06:33Z)
Recurrent Neural Network Learning of Performance and Intrinsic Population Dynamics from Sparse Neural Data [77.92736596690297]
We introduce a novel training strategy that allows learning not only the input-output behavior of an RNN but also its internal network dynamics. We test the proposed method by training an RNN to simultaneously reproduce internal dynamics and output signals of a physiologically-inspired neural model. Remarkably, we show that the reproduction of the internal dynamics is successful even when the training algorithm relies on the activities of a small subset of neurons.
arXiv Detail & Related papers (2020-05-05T14:16:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.