Do RNN and LSTM have Long Memory?
- URL: http://arxiv.org/abs/2006.03860v2
- Date: Wed, 10 Jun 2020 07:28:18 GMT
- Title: Do RNN and LSTM have Long Memory?
- Authors: Jingyu Zhao, Feiqing Huang, Jia Lv, Yanjie Duan, Zhen Qin, Guodong Li,
Guangjian Tian
- Abstract summary: We prove that RNN and LSTM do not have long memory from a statistical perspective.
A new definition for long memory networks is introduced, and it requires the model weights to decay at a rate.
To verify our theory, we convert RNN and LSTM into long memory networks by making a minimal modification, and their superiority is illustrated in modeling long-term dependence of various datasets.
- Score: 15.072891084847647
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The LSTM network was proposed to overcome the difficulty in learning
long-term dependence, and has made significant advancements in applications.
With its success and drawbacks in mind, this paper raises the question - do RNN
and LSTM have long memory? We answer it partially by proving that RNN and LSTM
do not have long memory from a statistical perspective. A new definition for
long memory networks is further introduced, and it requires the model weights
to decay at a polynomial rate. To verify our theory, we convert RNN and LSTM
into long memory networks by making a minimal modification, and their
superiority is illustrated in modeling long-term dependence of various
datasets.
Related papers
- Stuffed Mamba: State Collapse and State Capacity of RNN-Based Long-Context Modeling [69.36377985746878]
We study the cause of the inability to process long context for RNNs and suggest critical mitigations.
We first investigate *state collapse* (SC), a phenomenon that causes severe performance degradation on sequence lengths not encountered during training.
We train a series of Mamba-2 models on long documents to empirically estimate the recurrent state capacity in language modeling and passkey retrieval.
arXiv Detail & Related papers (2024-10-09T17:54:28Z) - Were RNNs All We Needed? [53.393497486332]
We revisit traditional recurrent neural networks (RNNs) from over a decade ago.
We show that by removing their hidden state dependencies from their input, forget, and update gates, LSTMs and GRUs no longer need to BPTT and can be efficiently trained in parallel.
arXiv Detail & Related papers (2024-10-02T03:06:49Z) - DON-LSTM: Multi-Resolution Learning with DeepONets and Long Short-Term
Memory Neural Networks [1.8434042562191815]
Deep operator networks (DeepONets, DONs) offer a distinct advantage over traditional neural networks in their ability to be trained on multi-resolution data.
We propose a novel architecture, named DON-LSTM, which extends the DeepONet with a long short-term memory network (LSTM)
We show that the proposed multi-resolution DON-LSTM achieves significantly lower generalization error and requires fewer high-resolution samples compared to its vanilla counterparts.
arXiv Detail & Related papers (2023-10-03T23:43:16Z) - Recurrent Neural Networks and Long Short-Term Memory Networks: Tutorial
and Survey [9.092591746522483]
This tutorial paper is on Recurrent Neural Network (RNN), Long Short-Term Memory Network (LSTM), and their variants.
We start with a dynamical system and backpropagation through time for RNN.
We discuss the problems of gradient vanishing and explosion in long-term dependencies.
Then, we introduce LSTM gates and cells, history and variants of LSTM, and Gated Recurrent Units (GRU)
arXiv Detail & Related papers (2023-04-22T18:22:10Z) - Working Memory Connections for LSTM [51.742526187978726]
We show that Working Memory Connections constantly improve the performance of LSTMs on a variety of tasks.
Numerical results suggest that the cell state contains useful information that is worth including in the gate structure.
arXiv Detail & Related papers (2021-08-31T18:01:30Z) - Object Tracking through Residual and Dense LSTMs [67.98948222599849]
Deep learning-based trackers based on LSTMs (Long Short-Term Memory) recurrent neural networks have emerged as a powerful alternative.
DenseLSTMs outperform Residual and regular LSTM, and offer a higher resilience to nuisances.
Our case study supports the adoption of residual-based RNNs for enhancing the robustness of other trackers.
arXiv Detail & Related papers (2020-06-22T08:20:17Z) - Tensor train decompositions on recurrent networks [60.334946204107446]
Matrix product state (MPS) tensor trains have more attractive features than MPOs, in terms of storage reduction and computing time at inference.
We show that MPS tensor trains should be at the forefront of LSTM network compression through a theoretical analysis and practical experiments on NLP task.
arXiv Detail & Related papers (2020-06-09T18:25:39Z) - Learning Various Length Dependence by Dual Recurrent Neural Networks [0.0]
We propose a new model named Dual Recurrent Neural Networks (DuRNN)
DuRNN consists of two parts to learn the short-term dependence and progressively learn the long-term dependence.
Our contributions are: 1) a new recurrent model developed based on the divide-and-conquer strategy to learn long and short-term dependence separately, and 2) a selection mechanism to enhance the separating and learning of different temporal scales of dependence.
arXiv Detail & Related papers (2020-05-28T09:30:01Z) - Achieving Online Regression Performance of LSTMs with Simple RNNs [0.0]
We introduce a first-order training algorithm with a linear time complexity in the number of parameters.
We show that when SRNNs are trained with our algorithm, they provide very similar regression performance with the LSTMs in two to three times shorter training time.
arXiv Detail & Related papers (2020-05-16T11:41:13Z) - Sentiment Analysis Using Simplified Long Short-term Memory Recurrent
Neural Networks [1.5146765382501612]
We perform sentiment analysis on a GOP Debate Twitter dataset.
To speed up training and reduce the computational cost and time, six different parameter reduced slim versions of the LSTM model are proposed.
arXiv Detail & Related papers (2020-05-08T12:50:10Z) - Recognizing Long Grammatical Sequences Using Recurrent Networks
Augmented With An External Differentiable Stack [73.48927855855219]
Recurrent neural networks (RNNs) are a widely used deep architecture for sequence modeling, generation, and prediction.
RNNs generalize poorly over very long sequences, which limits their applicability to many important temporal processing and time series forecasting problems.
One way to address these shortcomings is to couple an RNN with an external, differentiable memory structure, such as a stack.
In this paper, we improve the memory-augmented RNN with important architectural and state updating mechanisms.
arXiv Detail & Related papers (2020-04-04T14:19:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.