On the Curse of Memory in Recurrent Neural Networks: Approximation and Optimization Analysis
- URL: http://arxiv.org/abs/2009.07799v3
- Date: Fri, 30 Aug 2024 14:12:30 GMT
- Title: On the Curse of Memory in Recurrent Neural Networks: Approximation and Optimization Analysis
- Authors: Zhong Li, Jiequn Han, Weinan E, Qianxiao Li,
- Abstract summary: We consider the simple but representative setting of using continuous-time linear RNNs to learn from data generated by linear relationships.
We prove a universal approximation theorem of such linear functionals, and characterize the approximation rate and its relation with memory.
A unifying theme uncovered is the non-trivial effect of memory, a notion that can be made precise in our framework.
- Score: 30.75240284934018
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the approximation properties and optimization dynamics of recurrent neural networks (RNNs) when applied to learn input-output relationships in temporal data. We consider the simple but representative setting of using continuous-time linear RNNs to learn from data generated by linear relationships. Mathematically, the latter can be understood as a sequence of linear functionals. We prove a universal approximation theorem of such linear functionals, and characterize the approximation rate and its relation with memory. Moreover, we perform a fine-grained dynamical analysis of training linear RNNs, which further reveal the intricate interactions between memory and learning. A unifying theme uncovered is the non-trivial effect of memory, a notion that can be made precise in our framework, on approximation and optimization: when there is long term memory in the target, it takes a large number of neurons to approximate it. Moreover, the training process will suffer from slow downs. In particular, both of these effects become exponentially more pronounced with memory - a phenomenon we call the "curse of memory". These analyses represent a basic step towards a concrete mathematical understanding of new phenomenon that may arise in learning temporal relationships using recurrent architectures.
Related papers
- Demolition and Reinforcement of Memories in Spin-Glass-like Neural
Networks [0.0]
The aim of this thesis is to understand the effectiveness of Unlearning in both associative memory models and generative models.
The selection of structured data enables an associative memory model to retrieve concepts as attractors of a neural dynamics with considerable basins of attraction.
A novel regularization technique for Boltzmann Machines is presented, proving to outperform previously developed methods in learning hidden probability distributions from data-sets.
arXiv Detail & Related papers (2024-03-04T23:12:42Z) - PRES: Toward Scalable Memory-Based Dynamic Graph Neural Networks [22.47336262812308]
Memory-based Dynamic Graph Neural Networks (MDGNNs) are a family of dynamic graph neural networks that leverage a memory module to extract, distill, and long-term temporal dependencies.
This paper studies the efficient training of MDGNNs at scale, focusing on the temporal discontinuity in training MDGNNs with large temporal batch sizes.
arXiv Detail & Related papers (2024-02-06T01:34:56Z) - The Expressive Leaky Memory Neuron: an Efficient and Expressive Phenomenological Neuron Model Can Solve Long-Horizon Tasks [64.08042492426992]
We introduce the Expressive Memory (ELM) neuron model, a biologically inspired model of a cortical neuron.
Our ELM neuron can accurately match the aforementioned input-output relationship with under ten thousand trainable parameters.
We evaluate it on various tasks with demanding temporal structures, including the Long Range Arena (LRA) datasets.
arXiv Detail & Related papers (2023-06-14T13:34:13Z) - Inverse Approximation Theory for Nonlinear Recurrent Neural Networks [28.840757822712195]
We prove an inverse approximation theorem for the approximation of nonlinear sequence-to-sequence relationships using recurrent neural networks (RNNs)
We show that nonlinear sequence relationships that can be stably approximated by nonlinear RNNs must have an exponential decaying memory structure.
This extends the previously identified curse of memory in linear RNNs into the general nonlinear setting.
arXiv Detail & Related papers (2023-05-30T16:34:28Z) - Neuronal architecture extracts statistical temporal patterns [1.9662978733004601]
We show how higher-order temporal (co-)fluctuations can be employed to represent and process information.
A simple biologically inspired feedforward neuronal model is able to extract information from up to the third order cumulant to perform time series classification.
arXiv Detail & Related papers (2023-01-24T18:21:33Z) - Learning Low Dimensional State Spaces with Overparameterized Recurrent
Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory.
Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z) - What can linearized neural networks actually say about generalization? [67.83999394554621]
In certain infinitely-wide neural networks, the neural tangent kernel (NTK) theory fully characterizes generalization.
We show that the linear approximations can indeed rank the learning complexity of certain tasks for neural networks.
Our work provides concrete examples of novel deep learning phenomena which can inspire future theoretical research.
arXiv Detail & Related papers (2021-06-12T13:05:11Z) - PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive
Learning [109.84770951839289]
We present PredRNN, a new recurrent network for learning visual dynamics from historical context.
We show that our approach obtains highly competitive results on three standard datasets.
arXiv Detail & Related papers (2021-03-17T08:28:30Z) - Slow manifolds in recurrent networks encode working memory efficiently
and robustly [0.0]
Working memory is a cognitive function involving the storage and manipulation of latent information over brief intervals of time.
We use a top-down modeling approach to examine network-level mechanisms of working memory.
arXiv Detail & Related papers (2021-01-08T18:47:02Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z) - Understanding the Effects of Data Parallelism and Sparsity on Neural
Network Training [126.49572353148262]
We study two factors in neural network training: data parallelism and sparsity.
Despite their promising benefits, understanding of their effects on neural network training remains elusive.
arXiv Detail & Related papers (2020-03-25T10:49:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.