Slower is Better: Revisiting the Forgetting Mechanism in LSTM for Slower
Information Decay
- URL: http://arxiv.org/abs/2105.05944v1
- Date: Wed, 12 May 2021 20:21:16 GMT
- Title: Slower is Better: Revisiting the Forgetting Mechanism in LSTM for Slower
Information Decay
- Authors: Hsiang-Yun Sherry Chien, Javier S. Turek, Nicole Beckage, Vy A. Vo,
Christopher J. Honey, Ted L. Willke
- Abstract summary: We propose a power law forget gate, which learns to forget information along a slower power law decay function.
We show that LSTM with the proposed forget gate can learn long-term dependencies, outperforming other recurrent networks in multiple domains.
- Score: 4.414729427965163
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sequential information contains short- to long-range dependencies; however,
learning long-timescale information has been a challenge for recurrent neural
networks. Despite improvements in long short-term memory networks (LSTMs), the
forgetting mechanism results in the exponential decay of information, limiting
their capacity to capture long-timescale information. Here, we propose a power
law forget gate, which instead learns to forget information along a slower
power law decay function. Specifically, the new gate learns to control the
power law decay factor, p, allowing the network to adjust the information decay
rate according to task demands. Our experiments show that an LSTM with power
law forget gates (pLSTM) can effectively capture long-range dependencies beyond
hundreds of elements on image classification, language modeling, and
categorization tasks, improving performance over the vanilla LSTM. We also
inspected the revised forget gate by varying the initialization of p, setting p
to a fixed value, and ablating cells in the pLSTM network. The results show
that the information decay can be controlled by the learnable decay factor p,
which allows pLSTM to achieve its superior performance. Altogether, we found
that LSTM with the proposed forget gate can learn long-term dependencies,
outperforming other recurrent networks in multiple domains; such gating
mechanism can be integrated into other architectures for improving the learning
of long timescale information in recurrent neural networks.
Related papers
- Long Short-term Memory with Two-Compartment Spiking Neuron [64.02161577259426]
We propose a novel biologically inspired Long Short-Term Memory Leaky Integrate-and-Fire spiking neuron model, dubbed LSTM-LIF.
Our experimental results, on a diverse range of temporal classification tasks, demonstrate superior temporal classification capability, rapid training convergence, strong network generalizability, and high energy efficiency of the proposed LSTM-LIF model.
This work, therefore, opens up a myriad of opportunities for resolving challenging temporal processing tasks on emerging neuromorphic computing machines.
arXiv Detail & Related papers (2023-07-14T08:51:03Z) - An Improved Time Feedforward Connections Recurrent Neural Networks [3.0965505512285967]
Recurrent Neural Networks (RNNs) have been widely applied to deal with temporal problems, such as flood forecasting and financial data processing.
Traditional RNNs models amplify the gradient issue due to the strict time serial dependency.
An improved Time Feedforward Connections Recurrent Neural Networks (TFC-RNNs) model was first proposed to address the gradient issue.
A novel cell structure named Single Gate Recurrent Unit (SGRU) was presented to reduce the number of parameters for RNNs cell.
arXiv Detail & Related papers (2022-11-03T09:32:39Z) - Reducing Catastrophic Forgetting in Self Organizing Maps with
Internally-Induced Generative Replay [67.50637511633212]
A lifelong learning agent is able to continually learn from potentially infinite streams of pattern sensory data.
One major historic difficulty in building agents that adapt is that neural systems struggle to retain previously-acquired knowledge when learning from new samples.
This problem is known as catastrophic forgetting (interference) and remains an unsolved problem in the domain of machine learning to this day.
arXiv Detail & Related papers (2021-12-09T07:11:14Z) - Working Memory Connections for LSTM [51.742526187978726]
We show that Working Memory Connections constantly improve the performance of LSTMs on a variety of tasks.
Numerical results suggest that the cell state contains useful information that is worth including in the gate structure.
arXiv Detail & Related papers (2021-08-31T18:01:30Z) - RotLSTM: Rotating Memories in Recurrent Neural Networks [0.0]
We introduce the concept of modifying the cell state (memory) of LSTMs using rotation matrices parametrised by a new set of trainable weights.
This addition shows significant increases of performance on some of the tasks from the bAbI dataset.
arXiv Detail & Related papers (2021-05-01T23:48:58Z) - Time Series Forecasting with Stacked Long Short-Term Memory Networks [0.0]
This paper explores the effectiveness of applying stacked LSTM networks in the time series prediction domain, specifically, the traffic volume forecasting.
Being able to predict traffic volume more accurately can result in better planning, thus greatly reduce the operation cost and improve overall efficiency.
arXiv Detail & Related papers (2020-11-02T03:09:23Z) - HiPPO: Recurrent Memory with Optimal Polynomial Projections [93.3537706398653]
We introduce a general framework (HiPPO) for the online compression of continuous signals and discrete time series by projection onto bases.
Given a measure that specifies the importance of each time step in the past, HiPPO produces an optimal solution to a natural online function approximation problem.
This formal framework yields a new memory update mechanism (HiPPO-LegS) that scales through time to remember all history, avoiding priors on the timescale.
arXiv Detail & Related papers (2020-08-17T23:39:33Z) - Incremental Training of a Recurrent Neural Network Exploiting a
Multi-Scale Dynamic Memory [79.42778415729475]
We propose a novel incrementally trained recurrent architecture targeting explicitly multi-scale learning.
We show how to extend the architecture of a simple RNN by separating its hidden state into different modules.
We discuss a training algorithm where new modules are iteratively added to the model to learn progressively longer dependencies.
arXiv Detail & Related papers (2020-06-29T08:35:49Z) - Object Tracking through Residual and Dense LSTMs [67.98948222599849]
Deep learning-based trackers based on LSTMs (Long Short-Term Memory) recurrent neural networks have emerged as a powerful alternative.
DenseLSTMs outperform Residual and regular LSTM, and offer a higher resilience to nuisances.
Our case study supports the adoption of residual-based RNNs for enhancing the robustness of other trackers.
arXiv Detail & Related papers (2020-06-22T08:20:17Z) - Do RNN and LSTM have Long Memory? [15.072891084847647]
We prove that RNN and LSTM do not have long memory from a statistical perspective.
A new definition for long memory networks is introduced, and it requires the model weights to decay at a rate.
To verify our theory, we convert RNN and LSTM into long memory networks by making a minimal modification, and their superiority is illustrated in modeling long-term dependence of various datasets.
arXiv Detail & Related papers (2020-06-06T13:30:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.