Internal representation dynamics and geometry in recurrent neural
networks
- URL: http://arxiv.org/abs/2001.03255v2
- Date: Tue, 14 Jan 2020 14:23:02 GMT
- Title: Internal representation dynamics and geometry in recurrent neural
networks
- Authors: Stefan Horoi, Guillaume Lajoie and Guy Wolf
- Abstract summary: We show how a vanilla RNN implements a simple classification task by analysing the dynamics of the network.
We find that early internal representations are evocative of the real labels of the data but this information is not directly accessible to the output layer.
- Score: 10.016265742591674
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The efficiency of recurrent neural networks (RNNs) in dealing with sequential
data has long been established. However, unlike deep, and convolution networks
where we can attribute the recognition of a certain feature to every layer, it
is unclear what "sub-task" a single recurrent step or layer accomplishes. Our
work seeks to shed light onto how a vanilla RNN implements a simple
classification task by analysing the dynamics of the network and the geometric
properties of its hidden states. We find that early internal representations
are evocative of the real labels of the data but this information is not
directly accessible to the output layer. Furthermore the network's dynamics and
the sequence length are both critical to correct classifications even when
there is no additional task relevant information provided.
Related papers
- Coding schemes in neural networks learning classification tasks [52.22978725954347]
We investigate fully-connected, wide neural networks learning classification tasks.
We show that the networks acquire strong, data-dependent features.
Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity.
arXiv Detail & Related papers (2024-06-24T14:50:05Z) - Task structure and nonlinearity jointly determine learned
representational geometry [0.0]
We show that Tanh networks tend to learn representations that reflect the structure of the target outputs, while ReLU networks retain more information about the structure of the raw inputs.
Our findings shed light on the interplay between input-output geometry, nonlinearity, and learned representations in neural networks.
arXiv Detail & Related papers (2024-01-24T16:14:38Z) - Addressing caveats of neural persistence with deep graph persistence [54.424983583720675]
We find that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence.
We propose an extension of the filtration underlying neural persistence to the whole neural network instead of single layers.
This yields our deep graph persistence measure, which implicitly incorporates persistent paths through the network and alleviates variance-related issues.
arXiv Detail & Related papers (2023-07-20T13:34:11Z) - Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks [69.38572074372392]
We present the first results proving that feature learning occurs during training with a nonlinear model on multiple tasks.
Our key insight is that multi-task pretraining induces a pseudo-contrastive loss that favors representations that align points that typically have the same label across tasks.
arXiv Detail & Related papers (2023-07-13T16:39:08Z) - How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series.
We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z) - Redundant representations help generalization in wide neural networks [71.38860635025907]
We study the last hidden layer representations of various state-of-the-art convolutional neural networks.
We find that if the last hidden representation is wide enough, its neurons tend to split into groups that carry identical information, and differ from each other only by statistically independent noise.
arXiv Detail & Related papers (2021-06-07T10:18:54Z) - Topological Uncertainty: Monitoring trained neural networks through
persistence of activation graphs [0.9786690381850356]
In industrial applications, data coming from an open-world setting might widely differ from the benchmark datasets on which a network was trained.
We develop a method to monitor trained neural networks based on the topological properties of their activation graphs.
arXiv Detail & Related papers (2021-05-07T14:16:03Z) - Neural networks adapting to datasets: learning network size and topology [77.34726150561087]
We introduce a flexible setup allowing for a neural network to learn both its size and topology during the course of a gradient-based training.
The resulting network has the structure of a graph tailored to the particular learning task and dataset.
arXiv Detail & Related papers (2020-06-22T12:46:44Z) - Finding trainable sparse networks through Neural Tangent Transfer [16.092248433189816]
In deep learning, trainable sparse networks that perform well on a specific task are usually constructed using label-dependent pruning criteria.
In this article, we introduce Neural Tangent Transfer, a method that instead finds trainable sparse networks in a label-free manner.
arXiv Detail & Related papers (2020-06-15T08:58:01Z) - Modeling Dynamic Heterogeneous Network for Link Prediction using
Hierarchical Attention with Temporal RNN [16.362525151483084]
We propose a novel dynamic heterogeneous network embedding method, termed as DyHATR.
It uses hierarchical attention to learn heterogeneous information and incorporates recurrent neural networks with temporal attention to capture evolutionary patterns.
We benchmark our method on four real-world datasets for the task of link prediction.
arXiv Detail & Related papers (2020-04-01T17:16:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.