Related papers: Do RNN and LSTM have Long Memory?

Do RNN and LSTM have Long Memory?

URL: http://arxiv.org/abs/2006.03860v2
Date: Wed, 10 Jun 2020 07:28:18 GMT
Title: Do RNN and LSTM have Long Memory?
Authors: Jingyu Zhao, Feiqing Huang, Jia Lv, Yanjie Duan, Zhen Qin, Guodong Li, Guangjian Tian
Abstract summary: We prove that RNN and LSTM do not have long memory from a statistical perspective. A new definition for long memory networks is introduced, and it requires the model weights to decay at a rate. To verify our theory, we convert RNN and LSTM into long memory networks by making a minimal modification, and their superiority is illustrated in modeling long-term dependence of various datasets.
Score: 15.072891084847647
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The LSTM network was proposed to overcome the difficulty in learning long-term dependence, and has made significant advancements in applications. With its success and drawbacks in mind, this paper raises the question - do RNN and LSTM have long memory? We answer it partially by proving that RNN and LSTM do not have long memory from a statistical perspective. A new definition for long memory networks is further introduced, and it requires the model weights to decay at a polynomial rate. To verify our theory, we convert RNN and LSTM into long memory networks by making a minimal modification, and their superiority is illustrated in modeling long-term dependence of various datasets.

Related papers

Stuffed Mamba: State Collapse and State Capacity of RNN-Based Long-Context Modeling [69.36377985746878]
We study the cause of the inability to process long context for RNNs and suggest critical mitigations. We first investigate *state collapse* (SC), a phenomenon that causes severe performance degradation on sequence lengths not encountered during training. We train a series of Mamba-2 models on long documents to empirically estimate the recurrent state capacity in language modeling and passkey retrieval.
arXiv Detail & Related papers (2024-10-09T17:54:28Z)
Scalable Mechanistic Neural Networks for Differential Equations and Machine Learning [52.28945097811129]
We propose an enhanced neural network framework designed for scientific machine learning applications involving long temporal sequences.<n>We reduce the computational time and space complexities from cubic and quadratic with respect to the sequence length, respectively, to linear.<n>Extensive experiments demonstrate that S-MNN matches the original MNN in precision while substantially reducing computational resources.
arXiv Detail & Related papers (2024-10-08T14:27:28Z)
Were RNNs All We Needed? [53.393497486332]
We revisit traditional recurrent neural networks (RNNs) from over a decade ago. We show that by removing their hidden state dependencies from their input, forget, and update gates, LSTMs and GRUs no longer need to BPTT and can be efficiently trained in parallel.
arXiv Detail & Related papers (2024-10-02T03:06:49Z)
DON-LSTM: Multi-Resolution Learning with DeepONets and Long Short-Term Memory Neural Networks [1.8434042562191815]
Deep operator networks (DeepONets, DONs) offer a distinct advantage over traditional neural networks in their ability to be trained on multi-resolution data. We propose a novel architecture, named DON-LSTM, which extends the DeepONet with a long short-term memory network (LSTM) We show that the proposed multi-resolution DON-LSTM achieves significantly lower generalization error and requires fewer high-resolution samples compared to its vanilla counterparts.
arXiv Detail & Related papers (2023-10-03T23:43:16Z)
Recurrent Neural Networks and Long Short-Term Memory Networks: Tutorial and Survey [9.092591746522483]
This tutorial paper is on Recurrent Neural Network (RNN), Long Short-Term Memory Network (LSTM), and their variants. We start with a dynamical system and backpropagation through time for RNN. We discuss the problems of gradient vanishing and explosion in long-term dependencies. Then, we introduce LSTM gates and cells, history and variants of LSTM, and Gated Recurrent Units (GRU)
arXiv Detail & Related papers (2023-04-22T18:22:10Z)
Music Generation Using an LSTM [52.77024349608834]
Long Short-Term Memory (LSTM) network structures have proven to be very useful for making predictions for the next output in a series. We demonstrate an approach of music generation using Recurrent Neural Networks (RNN) We provide a brief synopsis of the intuition, theory, and application of LSTMs in music generation, develop and present the network we found to best achieve this goal, identify and address issues and challenges faced, and include potential future improvements for our network.
arXiv Detail & Related papers (2022-03-23T00:13:41Z)
Working Memory Connections for LSTM [51.742526187978726]
We show that Working Memory Connections constantly improve the performance of LSTMs on a variety of tasks. Numerical results suggest that the cell state contains useful information that is worth including in the gate structure.
arXiv Detail & Related papers (2021-08-31T18:01:30Z)
Object Tracking through Residual and Dense LSTMs [67.98948222599849]
Deep learning-based trackers based on LSTMs (Long Short-Term Memory) recurrent neural networks have emerged as a powerful alternative. DenseLSTMs outperform Residual and regular LSTM, and offer a higher resilience to nuisances. Our case study supports the adoption of residual-based RNNs for enhancing the robustness of other trackers.
arXiv Detail & Related papers (2020-06-22T08:20:17Z)
Tensor train decompositions on recurrent networks [60.334946204107446]
Matrix product state (MPS) tensor trains have more attractive features than MPOs, in terms of storage reduction and computing time at inference. We show that MPS tensor trains should be at the forefront of LSTM network compression through a theoretical analysis and practical experiments on NLP task.
arXiv Detail & Related papers (2020-06-09T18:25:39Z)
Learning Various Length Dependence by Dual Recurrent Neural Networks [0.0]
We propose a new model named Dual Recurrent Neural Networks (DuRNN) DuRNN consists of two parts to learn the short-term dependence and progressively learn the long-term dependence. Our contributions are: 1) a new recurrent model developed based on the divide-and-conquer strategy to learn long and short-term dependence separately, and 2) a selection mechanism to enhance the separating and learning of different temporal scales of dependence.
arXiv Detail & Related papers (2020-05-28T09:30:01Z)
Achieving Online Regression Performance of LSTMs with Simple RNNs [0.0]
We introduce a first-order training algorithm with a linear time complexity in the number of parameters. We show that when SRNNs are trained with our algorithm, they provide very similar regression performance with the LSTMs in two to three times shorter training time.
arXiv Detail & Related papers (2020-05-16T11:41:13Z)
Sentiment Analysis Using Simplified Long Short-term Memory Recurrent Neural Networks [1.5146765382501612]
We perform sentiment analysis on a GOP Debate Twitter dataset. To speed up training and reduce the computational cost and time, six different parameter reduced slim versions of the LSTM model are proposed.
arXiv Detail & Related papers (2020-05-08T12:50:10Z)
Recognizing Long Grammatical Sequences Using Recurrent Networks Augmented With An External Differentiable Stack [73.48927855855219]
Recurrent neural networks (RNNs) are a widely used deep architecture for sequence modeling, generation, and prediction. RNNs generalize poorly over very long sequences, which limits their applicability to many important temporal processing and time series forecasting problems. One way to address these shortcomings is to couple an RNN with an external, differentiable memory structure, such as a stack. In this paper, we improve the memory-augmented RNN with important architectural and state updating mechanisms.
arXiv Detail & Related papers (2020-04-04T14:19:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.