Related papers: Depth Enables Long-Term Memory for Recurrent Neural Networks

Depth Enables Long-Term Memory for Recurrent Neural Networks

URL: http://arxiv.org/abs/2003.10163v1
Date: Mon, 23 Mar 2020 10:29:14 GMT
Title: Depth Enables Long-Term Memory for Recurrent Neural Networks
Authors: Alon Ziv
Abstract summary: We introduce a measure of the network's ability to support information flow across time, referred to as the Start-End separation rank. We prove that deep recurrent networks support Start-End separation ranks which are higher than those supported by their shallow counterparts.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A key attribute that drives the unprecedented success of modern Recurrent Neural Networks (RNNs) on learning tasks which involve sequential data, is their ability to model intricate long-term temporal dependencies. However, a well established measure of RNNs long-term memory capacity is lacking, and thus formal understanding of the effect of depth on their ability to correlate data throughout time is limited. Specifically, existing depth efficiency results on convolutional networks do not suffice in order to account for the success of deep RNNs on data of varying lengths. In order to address this, we introduce a measure of the network's ability to support information flow across time, referred to as the Start-End separation rank, which reflects the distance of the function realized by the recurrent network from modeling no dependency between the beginning and end of the input sequence. We prove that deep recurrent networks support Start-End separation ranks which are combinatorially higher than those supported by their shallow counterparts. Thus, we establish that depth brings forth an overwhelming advantage in the ability of recurrent networks to model long-term dependencies, and provide an exemplar of quantifying this key attribute. We empirically demonstrate the discussed phenomena on common RNNs through extensive experimental evaluation using the optimization technique of restricting the hidden-to-hidden matrix to being orthogonal. Finally, we employ the tool of quantum Tensor Networks to gain additional graphic insights regarding the complexity brought forth by depth in recurrent networks.

Related papers

Analyzing the Neural Tangent Kernel of Periodically Activated Coordinate Networks [30.92757082348805]
We provide a theoretical understanding of periodically activated networks through an analysis of their Neural Tangent Kernel (NTK) Our findings indicate that periodically activated networks are textitnotably more well-behaved, from the NTK perspective, than ReLU activated networks.
arXiv Detail & Related papers (2024-02-07T12:06:52Z)
DON-LSTM: Multi-Resolution Learning with DeepONets and Long Short-Term Memory Neural Networks [1.8434042562191815]
Deep operator networks (DeepONets, DONs) offer a distinct advantage over traditional neural networks in their ability to be trained on multi-resolution data. We propose a novel architecture, named DON-LSTM, which extends the DeepONet with a long short-term memory network (LSTM) We show that the proposed multi-resolution DON-LSTM achieves significantly lower generalization error and requires fewer high-resolution samples compared to its vanilla counterparts.
arXiv Detail & Related papers (2023-10-03T23:43:16Z)
Addressing caveats of neural persistence with deep graph persistence [54.424983583720675]
We find that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence. We propose an extension of the filtration underlying neural persistence to the whole neural network instead of single layers. This yields our deep graph persistence measure, which implicitly incorporates persistent paths through the network and alleviates variance-related issues.
arXiv Detail & Related papers (2023-07-20T13:34:11Z)
Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime. We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z)
Bayesian Interpolation with Deep Linear Networks [92.1721532941863]
Characterizing how neural network depth, width, and dataset size jointly impact model quality is a central problem in deep learning theory. We show that linear networks make provably optimal predictions at infinite depth. We also show that with data-agnostic priors, Bayesian model evidence in wide linear networks is maximized at infinite depth.
arXiv Detail & Related papers (2022-12-29T20:57:46Z)
Spiking Neural Networks for event-based action recognition: A new task to understand their advantage [1.4348901037145936]
Spiking Neural Networks (SNNs) are characterised by their unique temporal dynamics. We show how Spiking neurons can enable temporal feature extraction in feed-forward neural networks. We also show how recurrent SNNs can achieve comparable results to LSTM with a smaller number of parameters.
arXiv Detail & Related papers (2022-09-29T16:22:46Z)
Scalable Spatiotemporal Graph Neural Networks [14.415967477487692]
Graph neural networks (GNNs) are often the core component of the forecasting architecture. In most pretemporal GNNs, the computational complexity scales up to a quadratic factor with the length of the sequence times the number of links in the graph. We propose a scalable architecture that exploits an efficient encoding of both temporal and spatial dynamics.
arXiv Detail & Related papers (2022-09-14T09:47:38Z)
Semi-supervised Network Embedding with Differentiable Deep Quantisation [81.49184987430333]
We develop d-SNEQ, a differentiable quantisation method for network embedding. d-SNEQ incorporates a rank loss to equip the learned quantisation codes with rich high-order information. It is able to substantially compress the size of trained embeddings, thus reducing storage footprint and accelerating retrieval speed.
arXiv Detail & Related papers (2021-08-20T11:53:05Z)
Deep Cellular Recurrent Network for Efficient Analysis of Time-Series Data with Spatial Information [52.635997570873194]
This work proposes a novel deep cellular recurrent neural network (DCRNN) architecture to process complex multi-dimensional time series data with spatial information. The proposed architecture achieves state-of-the-art performance while utilizing substantially less trainable parameters when compared to comparable methods in the literature.
arXiv Detail & Related papers (2021-01-12T20:08:18Z)
Hybrid Backpropagation Parallel Reservoir Networks [8.944918753413827]
We propose a novel hybrid network, which combines the effectiveness of learning random temporal features of reservoirs with the readout power of a deep neural network with batch normalization. We demonstrate that our new network outperforms LSTMs and GRUs, including multi-layer "deep" versions of these networks. We show also that the inclusion of a novel meta-ring structure, which we call HBP-ESN M-Ring, achieves similar performance to one large reservoir while decreasing the memory required by an order of magnitude.
arXiv Detail & Related papers (2020-10-27T21:03:35Z)
Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis. By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner. This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.