Learning and Generalization in RNNs
- URL: http://arxiv.org/abs/2106.00047v1
- Date: Mon, 31 May 2021 18:27:51 GMT
- Title: Learning and Generalization in RNNs
- Authors: Abhishek Panigrahi, Navin Goyal
- Abstract summary: We prove that simple recurrent neural networks can learn functions of sequences.
New ideas enable us to extract information from the hidden state of the RNN in our proofs.
- Score: 11.107204912245841
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Simple recurrent neural networks (RNNs) and their more advanced cousins LSTMs
etc. have been very successful in sequence modeling. Their theoretical
understanding, however, is lacking and has not kept pace with the progress for
feedforward networks, where a reasonably complete understanding in the special
case of highly overparametrized one-hidden-layer networks has emerged. In this
paper, we make progress towards remedying this situation by proving that RNNs
can learn functions of sequences. In contrast to the previous work that could
only deal with functions of sequences that are sums of functions of individual
tokens in the sequence, we allow general functions. Conceptually and
technically, we introduce new ideas which enable us to extract information from
the hidden state of the RNN in our proofs -- addressing a crucial weakness in
previous work. We illustrate our results on some regular language recognition
problems.
Related papers
- On Logical Extrapolation for Mazes with Recurrent and Implicit Networks [2.0037131645168396]
We show that the capacity for extrapolation is less robust than previously suggested.
We show that while INNs are capable of generalizing to larger maze instances, they fail to generalize along axes of difficulty other than maze size.
arXiv Detail & Related papers (2024-10-03T22:07:51Z) - Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations [54.17275171325324]
We present a counterexample to the Linear Representation Hypothesis (LRH)
When trained to repeat an input token sequence, neural networks learn to represent the token at each position with a particular order of magnitude, rather than a direction.
These findings strongly indicate that interpretability research should not be confined to the LRH.
arXiv Detail & Related papers (2024-08-20T15:04:37Z) - How Graph Neural Networks Learn: Lessons from Training Dynamics [80.41778059014393]
We study the training dynamics in function space of graph neural networks (GNNs)
We find that the gradient descent optimization of GNNs implicitly leverages the graph structure to update the learned function.
This finding offers new interpretable insights into when and why the learned GNN functions generalize.
arXiv Detail & Related papers (2023-10-08T10:19:56Z) - Episodic Memory Theory for the Mechanistic Interpretation of Recurrent
Neural Networks [3.683202928838613]
We propose the Episodic Memory Theory (EMT), illustrating that RNNs can be conceptualized as discrete-time analogs of the recently proposed General Sequential Episodic Memory Model.
We introduce a novel set of algorithmic tasks tailored to probe the variable binding behavior in RNNs.
Our empirical investigations reveal that trained RNNs consistently converge to the variable binding circuit, thus indicating universality in the dynamics of RNNs.
arXiv Detail & Related papers (2023-10-03T20:52:37Z) - How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series.
We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z) - Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime.
We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK
We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z) - Neural networks trained with SGD learn distributions of increasing
complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics.
We then exploit higher-order statistics only later during training.
We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z) - Implicit N-grams Induced by Recurrence [10.053475465955794]
We present a study that shows there actually exist some explainable components that reside within the hidden states.
We evaluated such extracted explainable features from trained RNNs on downstream sentiment analysis tasks and found they could be used to model interesting linguistic phenomena.
arXiv Detail & Related papers (2022-05-05T15:53:46Z) - How Neural Networks Extrapolate: From Feedforward to Graph Neural
Networks [80.55378250013496]
We study how neural networks trained by gradient descent extrapolate what they learn outside the support of the training distribution.
Graph Neural Networks (GNNs) have shown some success in more complex tasks.
arXiv Detail & Related papers (2020-09-24T17:48:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.