Learning and Generalization in RNNs
- URL: http://arxiv.org/abs/2106.00047v1
- Date: Mon, 31 May 2021 18:27:51 GMT
- Title: Learning and Generalization in RNNs
- Authors: Abhishek Panigrahi, Navin Goyal
- Abstract summary: We prove that simple recurrent neural networks can learn functions of sequences.
New ideas enable us to extract information from the hidden state of the RNN in our proofs.
- Score: 11.107204912245841
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Simple recurrent neural networks (RNNs) and their more advanced cousins LSTMs
etc. have been very successful in sequence modeling. Their theoretical
understanding, however, is lacking and has not kept pace with the progress for
feedforward networks, where a reasonably complete understanding in the special
case of highly overparametrized one-hidden-layer networks has emerged. In this
paper, we make progress towards remedying this situation by proving that RNNs
can learn functions of sequences. In contrast to the previous work that could
only deal with functions of sequences that are sums of functions of individual
tokens in the sequence, we allow general functions. Conceptually and
technically, we introduce new ideas which enable us to extract information from
the hidden state of the RNN in our proofs -- addressing a crucial weakness in
previous work. We illustrate our results on some regular language recognition
problems.
Related papers
- Towards Scalable and Versatile Weight Space Learning [51.78426981947659]
This paper introduces the SANE approach to weight-space learning.
Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
arXiv Detail & Related papers (2024-06-14T13:12:07Z) - ReLUs Are Sufficient for Learning Implicit Neural Representations [17.786058035763254]
We revisit the use of ReLU activation functions for learning implicit neural representations.
Inspired by second order B-spline wavelets, we incorporate a set of simple constraints to the ReLU neurons in each layer of a deep neural network (DNN)
We demonstrate that, contrary to popular belief, one can learn state-of-the-art INRs based on a DNN composed of only ReLU neurons.
arXiv Detail & Related papers (2024-06-04T17:51:08Z) - How Graph Neural Networks Learn: Lessons from Training Dynamics [80.41778059014393]
We study the training dynamics in function space of graph neural networks (GNNs)
We find that the gradient descent optimization of GNNs implicitly leverages the graph structure to update the learned function.
This finding offers new interpretable insights into when and why the learned GNN functions generalize.
arXiv Detail & Related papers (2023-10-08T10:19:56Z) - Episodic Memory Theory for the Mechanistic Interpretation of Recurrent
Neural Networks [3.683202928838613]
We propose the Episodic Memory Theory (EMT), illustrating that RNNs can be conceptualized as discrete-time analogs of the recently proposed General Sequential Episodic Memory Model.
We introduce a novel set of algorithmic tasks tailored to probe the variable binding behavior in RNNs.
Our empirical investigations reveal that trained RNNs consistently converge to the variable binding circuit, thus indicating universality in the dynamics of RNNs.
arXiv Detail & Related papers (2023-10-03T20:52:37Z) - How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series.
We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z) - Neural networks trained with SGD learn distributions of increasing
complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics.
We then exploit higher-order statistics only later during training.
We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z) - Implicit N-grams Induced by Recurrence [10.053475465955794]
We present a study that shows there actually exist some explainable components that reside within the hidden states.
We evaluated such extracted explainable features from trained RNNs on downstream sentiment analysis tasks and found they could be used to model interesting linguistic phenomena.
arXiv Detail & Related papers (2022-05-05T15:53:46Z) - Reinforcement Learning with External Knowledge by using Logical Neural
Networks [67.46162586940905]
A recent neuro-symbolic framework called the Logical Neural Networks (LNNs) can simultaneously provide key-properties of both neural networks and symbolic logic.
We propose an integrated method that enables model-free reinforcement learning from external knowledge sources.
arXiv Detail & Related papers (2021-03-03T12:34:59Z) - How Neural Networks Extrapolate: From Feedforward to Graph Neural
Networks [80.55378250013496]
We study how neural networks trained by gradient descent extrapolate what they learn outside the support of the training distribution.
Graph Neural Networks (GNNs) have shown some success in more complex tasks.
arXiv Detail & Related papers (2020-09-24T17:48:59Z) - Internal representation dynamics and geometry in recurrent neural
networks [10.016265742591674]
We show how a vanilla RNN implements a simple classification task by analysing the dynamics of the network.
We find that early internal representations are evocative of the real labels of the data but this information is not directly accessible to the output layer.
arXiv Detail & Related papers (2020-01-09T23:19:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.