Related papers: Working Memory Connections for LSTM

Working Memory Connections for LSTM

URL: http://arxiv.org/abs/2109.00020v1
Date: Tue, 31 Aug 2021 18:01:30 GMT
Title: Working Memory Connections for LSTM
Authors: Federico Landi, Lorenzo Baraldi, Marcella Cornia, Rita Cucchiara
Abstract summary: We show that Working Memory Connections constantly improve the performance of LSTMs on a variety of tasks. Numerical results suggest that the cell state contains useful information that is worth including in the gate structure.
Score: 51.742526187978726
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recurrent Neural Networks with Long Short-Term Memory (LSTM) make use of gating mechanisms to mitigate exploding and vanishing gradients when learning long-term dependencies. For this reason, LSTMs and other gated RNNs are widely adopted, being the standard de facto for many sequence modeling tasks. Although the memory cell inside the LSTM contains essential information, it is not allowed to influence the gating mechanism directly. In this work, we improve the gate potential by including information coming from the internal cell state. The proposed modification, named Working Memory Connection, consists in adding a learnable nonlinear projection of the cell content into the network gates. This modification can fit into the classical LSTM gates without any assumption on the underlying task, being particularly effective when dealing with longer sequences. Previous research effort in this direction, which goes back to the early 2000s, could not bring a consistent improvement over vanilla LSTM. As part of this paper, we identify a key issue tied to previous connections that heavily limits their effectiveness, hence preventing a successful integration of the knowledge coming from the internal cell state. We show through extensive experimental evaluation that Working Memory Connections constantly improve the performance of LSTMs on a variety of tasks. Numerical results suggest that the cell state contains useful information that is worth including in the gate structure.

Related papers

xLSTM: Extended Long Short-Term Memory [26.607656211983155]
In the 1990s, constant error carousel and gating were introduced as the central ideas of the Long Short-Term Memory (LSTM) We introduce exponential gating with appropriate normalization and stabilization techniques. We modify the LSTM memory structure, obtaining: (i) sLSTM with a scalar memory, a scalar update, and new memory mixing, (ii) mLSTM that is fully parallelizable with a matrix memory and a covariance update rule.
arXiv Detail & Related papers (2024-05-07T17:50:21Z)
NAC-TCN: Temporal Convolutional Networks with Causal Dilated Neighborhood Attention for Emotion Understanding [60.74434735079253]
We propose a method known as Neighborhood Attention with Convolutions TCN (NAC-TCN) We accomplish this by introducing a causal version of Dilated Neighborhood Attention while incorporating it with convolutions. Our model achieves comparable, better, or state-of-the-art performance over TCNs, TCAN, LSTMs, and GRUs while requiring fewer parameters on standard emotion recognition datasets.
arXiv Detail & Related papers (2023-12-12T18:41:30Z)
RigLSTM: Recurrent Independent Grid LSTM for Generalizable Sequence Learning [75.61681328968714]
We propose recurrent independent Grid LSTM (RigLSTM) to exploit the underlying modular structure of the target task. Our model adopts cell selection, input feature selection, hidden state selection, and soft state updating to achieve a better generalization ability.
arXiv Detail & Related papers (2023-11-03T07:40:06Z)
Memory-Guided Semantic Learning Network for Temporal Sentence Grounding [55.31041933103645]
We propose a memory-augmented network that learns and memorizes the rarely appeared content in TSG tasks. MGSL-Net consists of three main parts: a cross-modal inter-action module, a memory augmentation module, and a heterogeneous attention module.
arXiv Detail & Related papers (2022-01-03T02:32:06Z)
Gates Are Not What You Need in RNNs [2.6199029802346754]
We propose a new recurrent cell called Residual Recurrent Unit (RRU) which beats traditional cells and does not employ a single gate. It is based on the residual shortcut connection, linear transformations, ReLU, and normalization. Our experiments show that RRU outperforms the traditional gated units on most of these tasks.
arXiv Detail & Related papers (2021-08-01T19:20:34Z)
Slower is Better: Revisiting the Forgetting Mechanism in LSTM for Slower Information Decay [4.414729427965163]
We propose a power law forget gate, which learns to forget information along a slower power law decay function. We show that LSTM with the proposed forget gate can learn long-term dependencies, outperforming other recurrent networks in multiple domains.
arXiv Detail & Related papers (2021-05-12T20:21:16Z)
"Forget" the Forget Gate: Estimating Anomalies in Videos using Self-contained Long Short-Term Memory Networks [20.211951213040937]
We present an approach of detecting anomalies in videos by learning a novel LSTM based self-contained network on normal dense optical flow. We introduce a bi-gated, light LSTM cell by discarding the forget gate and introducing sigmoid activation. Removing the forget gate results in a simplified and undemanding LSTM cell with improved performance effectiveness and computational efficiency.
arXiv Detail & Related papers (2021-04-03T20:43:49Z)
Object Tracking through Residual and Dense LSTMs [67.98948222599849]
Deep learning-based trackers based on LSTMs (Long Short-Term Memory) recurrent neural networks have emerged as a powerful alternative. DenseLSTMs outperform Residual and regular LSTM, and offer a higher resilience to nuisances. Our case study supports the adoption of residual-based RNNs for enhancing the robustness of other trackers.
arXiv Detail & Related papers (2020-06-22T08:20:17Z)
Long short-term memory networks and laglasso for bond yield forecasting: Peeping inside the black box [10.412912723760172]
We conduct the first study of bond yield forecasting using long short-term memory (LSTM) networks. We calculate the LSTM signals through time, at selected locations in the memory cell, using sequence-to-sequence architectures.
arXiv Detail & Related papers (2020-05-05T14:23:00Z)
Refined Gate: A Simple and Effective Gating Mechanism for Recurrent Units [68.30422112784355]
We propose a new gating mechanism within general gated recurrent neural networks to handle this issue. The proposed gates directly short connect the extracted input features to the outputs of vanilla gates. We verify the proposed gating mechanism on three popular types of gated RNNs including LSTM, GRU and MGU.
arXiv Detail & Related papers (2020-02-26T07:51:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.