Related papers: RigLSTM: Recurrent Independent Grid LSTM for Generalizable Sequence Learning

RigLSTM: Recurrent Independent Grid LSTM for Generalizable Sequence Learning

URL: http://arxiv.org/abs/2311.02123v1
Date: Fri, 3 Nov 2023 07:40:06 GMT
Title: RigLSTM: Recurrent Independent Grid LSTM for Generalizable Sequence Learning
Authors: Ziyu Wang, Wenhao Jiang, Zixuan Zhang, Wei Tang, Junchi Yan
Abstract summary: We propose recurrent independent Grid LSTM (RigLSTM) to exploit the underlying modular structure of the target task. Our model adopts cell selection, input feature selection, hidden state selection, and soft state updating to achieve a better generalization ability.
Score: 75.61681328968714
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Sequential processes in real-world often carry a combination of simple subsystems that interact with each other in certain forms. Learning such a modular structure can often improve the robustness against environmental changes. In this paper, we propose recurrent independent Grid LSTM (RigLSTM), composed of a group of independent LSTM cells that cooperate with each other, for exploiting the underlying modular structure of the target task. Our model adopts cell selection, input feature selection, hidden state selection, and soft state updating to achieve a better generalization ability on the basis of the recent Grid LSTM for the tasks where some factors differ between training and evaluation. Specifically, at each time step, only a fraction of cells are activated, and the activated cells select relevant inputs and cells to communicate with. At the end of one time step, the hidden states of the activated cells are updated by considering the relevance between the inputs and the hidden states from the last and current time steps. Extensive experiments on diversified sequential modeling tasks are conducted to show the superior generalization ability when there exist changes in the testing environment. Source code is available at \url{https://github.com/ziyuwwang/rig-lstm}.

Related papers

On the Expressiveness and Length Generalization of Selective State-Space Models on Regular Languages [56.22289522687125]
Selective state-space models (SSMs) are an emerging alternative to the Transformer. We analyze their expressiveness and length generalization performance on regular language tasks. We introduce the Selective Dense State-Space Model (SD-SSM), the first selective SSM that exhibits perfect length generalization.
arXiv Detail & Related papers (2024-12-26T20:53:04Z)
packetLSTM: Dynamic LSTM Framework for Streaming Data with Varying Feature Space [44.62845936150961]
We study the online learning problem characterized by the varying input feature space of streaming data. We propose a dynamic LSTM-based novel method, called packetLSTM, to model the dimension-varying streams. packetLSTM achieves state-of-the-art results on five datasets, and its underlying principle is extended to other RNN types, like GRU and vanilla RNN.
arXiv Detail & Related papers (2024-10-22T20:01:39Z)
Mamba-FSCIL: Dynamic Adaptation with Selective State Space Model for Few-Shot Class-Incremental Learning [113.89327264634984]
Few-shot class-incremental learning (FSCIL) confronts the challenge of integrating new classes into a model with minimal training samples. Traditional methods widely adopt static adaptation relying on a fixed parameter space to learn from data that arrive sequentially. We propose a dual selective SSM projector that dynamically adjusts the projection parameters based on the intermediate features for dynamic adaptation.
arXiv Detail & Related papers (2024-07-08T17:09:39Z)
B-LSTM-MIONet: Bayesian LSTM-based Neural Operators for Learning the Response of Complex Dynamical Systems to Length-Variant Multiple Input Functions [6.75867828529733]
Multiple-input deep neural operators (MIONet) extended DeepONet to allow multiple input functions in different Banach spaces. MIONet offers flexibility in training dataset grid spacing, without constraints on output location. This work redesigns MIONet, integrating Long Short Term Memory (LSTM) to learn neural operators from time-dependent data.
arXiv Detail & Related papers (2023-11-28T04:58:17Z)
Sparse Modular Activation for Efficient Sequence Modeling [94.11125833685583]
Recent models combining Linear State Space Models with self-attention mechanisms have demonstrated impressive results across a range of sequence modeling tasks. Current approaches apply attention modules statically and uniformly to all elements in the input sequences, leading to sub-optimal quality-efficiency trade-offs. We introduce Sparse Modular Activation (SMA), a general mechanism enabling neural networks to sparsely activate sub-modules for sequence elements in a differentiable manner.
arXiv Detail & Related papers (2023-06-19T23:10:02Z)
Image Classification using Sequence of Pixels [3.04585143845864]
This study compares sequential image classification methods based on recurrent neural networks. We describe methods based on Long-Short-Term memory(LSTM), bidirectional Long-Short-Term memory(BiLSTM) architectures, etc.
arXiv Detail & Related papers (2022-09-23T09:42:44Z)
MM-TTA: Multi-Modal Test-Time Adaptation for 3D Semantic Segmentation [104.48766162008815]
We propose and explore a new multi-modal extension of test-time adaptation for 3D semantic segmentation. To design a framework that can take full advantage of multi-modality, each modality provides regularized self-supervisory signals to other modalities. Our regularized pseudo labels produce stable self-learning signals in numerous multi-modal test-time adaptation scenarios.
arXiv Detail & Related papers (2022-04-27T02:28:12Z)
Working Memory Connections for LSTM [51.742526187978726]
We show that Working Memory Connections constantly improve the performance of LSTMs on a variety of tasks. Numerical results suggest that the cell state contains useful information that is worth including in the gate structure.
arXiv Detail & Related papers (2021-08-31T18:01:30Z)
Comparisons among different stochastic selection of activation layers for convolutional neural networks for healthcare [77.99636165307996]
We classify biomedical images using ensembles of neural networks. We select our activations among the following ones: ReLU, leaky ReLU, Parametric ReLU, ELU, Adaptive Piecewice Linear Unit, S-Shaped ReLU, Swish, Mish, Mexican Linear Unit, Parametric Deformable Linear Unit, Soft Root Sign.
arXiv Detail & Related papers (2020-11-24T01:53:39Z)
Alternating ConvLSTM: Learning Force Propagation with Alternate State Updates [29.011464047344614]
We introduce the alternating convolutional Long Short-Term Memory (Alt-ConvLSTM) that models the force propagation mechanisms in a deformable object with near-uniform material properties. We demonstrate how this novel scheme imitates the alternate updates of the first and second-order terms in the forward method of numerical PDE solvers. We validate our Alt-ConvLSTM on human soft tissue simulation with thousands of particles and consistent body pose changes.
arXiv Detail & Related papers (2020-06-14T06:43:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.