Working Memory Connections for LSTM
- URL: http://arxiv.org/abs/2109.00020v1
- Date: Tue, 31 Aug 2021 18:01:30 GMT
- Title: Working Memory Connections for LSTM
- Authors: Federico Landi, Lorenzo Baraldi, Marcella Cornia, Rita Cucchiara
- Abstract summary: We show that Working Memory Connections constantly improve the performance of LSTMs on a variety of tasks.
Numerical results suggest that the cell state contains useful information that is worth including in the gate structure.
- Score: 51.742526187978726
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recurrent Neural Networks with Long Short-Term Memory (LSTM) make use of
gating mechanisms to mitigate exploding and vanishing gradients when learning
long-term dependencies. For this reason, LSTMs and other gated RNNs are widely
adopted, being the standard de facto for many sequence modeling tasks. Although
the memory cell inside the LSTM contains essential information, it is not
allowed to influence the gating mechanism directly. In this work, we improve
the gate potential by including information coming from the internal cell
state. The proposed modification, named Working Memory Connection, consists in
adding a learnable nonlinear projection of the cell content into the network
gates. This modification can fit into the classical LSTM gates without any
assumption on the underlying task, being particularly effective when dealing
with longer sequences. Previous research effort in this direction, which goes
back to the early 2000s, could not bring a consistent improvement over vanilla
LSTM. As part of this paper, we identify a key issue tied to previous
connections that heavily limits their effectiveness, hence preventing a
successful integration of the knowledge coming from the internal cell state. We
show through extensive experimental evaluation that Working Memory Connections
constantly improve the performance of LSTMs on a variety of tasks. Numerical
results suggest that the cell state contains useful information that is worth
including in the gate structure.
Related papers
- xLSTM: Extended Long Short-Term Memory [26.607656211983155]
In the 1990s, constant error carousel and gating were introduced as the central ideas of the Long Short-Term Memory (LSTM)
We introduce exponential gating with appropriate normalization and stabilization techniques.
We modify the LSTM memory structure, obtaining: (i) sLSTM with a scalar memory, a scalar update, and new memory mixing, (ii) mLSTM that is fully parallelizable with a matrix memory and a covariance update rule.
arXiv Detail & Related papers (2024-05-07T17:50:21Z) - ARNN: Attentive Recurrent Neural Network for Multi-channel EEG Signals
to Identify Epileptic Seizures [2.8244056068360095]
We propose an Attentive Recurrent Neural Network (ARNN), which recurrently applies attention layers along a sequence.
The proposed model operates on multi-channel EEG signals rather than single channel signals and leverages parallel computation.
arXiv Detail & Related papers (2024-03-05T19:15:17Z) - NAC-TCN: Temporal Convolutional Networks with Causal Dilated
Neighborhood Attention for Emotion Understanding [60.74434735079253]
We propose a method known as Neighborhood Attention with Convolutions TCN (NAC-TCN)
We accomplish this by introducing a causal version of Dilated Neighborhood Attention while incorporating it with convolutions.
Our model achieves comparable, better, or state-of-the-art performance over TCNs, TCAN, LSTMs, and GRUs while requiring fewer parameters on standard emotion recognition datasets.
arXiv Detail & Related papers (2023-12-12T18:41:30Z) - RigLSTM: Recurrent Independent Grid LSTM for Generalizable Sequence
Learning [75.61681328968714]
We propose recurrent independent Grid LSTM (RigLSTM) to exploit the underlying modular structure of the target task.
Our model adopts cell selection, input feature selection, hidden state selection, and soft state updating to achieve a better generalization ability.
arXiv Detail & Related papers (2023-11-03T07:40:06Z) - Memory-Guided Semantic Learning Network for Temporal Sentence Grounding [55.31041933103645]
We propose a memory-augmented network that learns and memorizes the rarely appeared content in TSG tasks.
MGSL-Net consists of three main parts: a cross-modal inter-action module, a memory augmentation module, and a heterogeneous attention module.
arXiv Detail & Related papers (2022-01-03T02:32:06Z) - Slower is Better: Revisiting the Forgetting Mechanism in LSTM for Slower
Information Decay [4.414729427965163]
We propose a power law forget gate, which learns to forget information along a slower power law decay function.
We show that LSTM with the proposed forget gate can learn long-term dependencies, outperforming other recurrent networks in multiple domains.
arXiv Detail & Related papers (2021-05-12T20:21:16Z) - "Forget" the Forget Gate: Estimating Anomalies in Videos using
Self-contained Long Short-Term Memory Networks [20.211951213040937]
We present an approach of detecting anomalies in videos by learning a novel LSTM based self-contained network on normal dense optical flow.
We introduce a bi-gated, light LSTM cell by discarding the forget gate and introducing sigmoid activation.
Removing the forget gate results in a simplified and undemanding LSTM cell with improved performance effectiveness and computational efficiency.
arXiv Detail & Related papers (2021-04-03T20:43:49Z) - Object Tracking through Residual and Dense LSTMs [67.98948222599849]
Deep learning-based trackers based on LSTMs (Long Short-Term Memory) recurrent neural networks have emerged as a powerful alternative.
DenseLSTMs outperform Residual and regular LSTM, and offer a higher resilience to nuisances.
Our case study supports the adoption of residual-based RNNs for enhancing the robustness of other trackers.
arXiv Detail & Related papers (2020-06-22T08:20:17Z) - Long short-term memory networks and laglasso for bond yield forecasting:
Peeping inside the black box [10.412912723760172]
We conduct the first study of bond yield forecasting using long short-term memory (LSTM) networks.
We calculate the LSTM signals through time, at selected locations in the memory cell, using sequence-to-sequence architectures.
arXiv Detail & Related papers (2020-05-05T14:23:00Z) - Refined Gate: A Simple and Effective Gating Mechanism for Recurrent
Units [68.30422112784355]
We propose a new gating mechanism within general gated recurrent neural networks to handle this issue.
The proposed gates directly short connect the extracted input features to the outputs of vanilla gates.
We verify the proposed gating mechanism on three popular types of gated RNNs including LSTM, GRU and MGU.
arXiv Detail & Related papers (2020-02-26T07:51:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.