RigLSTM: Recurrent Independent Grid LSTM for Generalizable Sequence
Learning
- URL: http://arxiv.org/abs/2311.02123v1
- Date: Fri, 3 Nov 2023 07:40:06 GMT
- Title: RigLSTM: Recurrent Independent Grid LSTM for Generalizable Sequence
Learning
- Authors: Ziyu Wang, Wenhao Jiang, Zixuan Zhang, Wei Tang, Junchi Yan
- Abstract summary: We propose recurrent independent Grid LSTM (RigLSTM) to exploit the underlying modular structure of the target task.
Our model adopts cell selection, input feature selection, hidden state selection, and soft state updating to achieve a better generalization ability.
- Score: 75.61681328968714
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sequential processes in real-world often carry a combination of simple
subsystems that interact with each other in certain forms. Learning such a
modular structure can often improve the robustness against environmental
changes. In this paper, we propose recurrent independent Grid LSTM (RigLSTM),
composed of a group of independent LSTM cells that cooperate with each other,
for exploiting the underlying modular structure of the target task. Our model
adopts cell selection, input feature selection, hidden state selection, and
soft state updating to achieve a better generalization ability on the basis of
the recent Grid LSTM for the tasks where some factors differ between training
and evaluation. Specifically, at each time step, only a fraction of cells are
activated, and the activated cells select relevant inputs and cells to
communicate with. At the end of one time step, the hidden states of the
activated cells are updated by considering the relevance between the inputs and
the hidden states from the last and current time steps. Extensive experiments
on diversified sequential modeling tasks are conducted to show the superior
generalization ability when there exist changes in the testing environment.
Source code is available at \url{https://github.com/ziyuwwang/rig-lstm}.
Related papers
- packetLSTM: Dynamic LSTM Framework for Streaming Data with Varying Feature Space [44.62845936150961]
We study the online learning problem characterized by the varying input feature space of streaming data.
We propose a dynamic LSTM-based novel method, called packetLSTM, to model the dimension-varying streams.
packetLSTM achieves state-of-the-art results on five datasets, and its underlying principle is extended to other RNN types, like GRU and vanilla RNN.
arXiv Detail & Related papers (2024-10-22T20:01:39Z) - Mamba-FSCIL: Dynamic Adaptation with Selective State Space Model for Few-Shot Class-Incremental Learning [113.89327264634984]
Few-shot class-incremental learning (FSCIL) confronts the challenge of integrating new classes into a model with minimal training samples.
Traditional methods widely adopt static adaptation relying on a fixed parameter space to learn from data that arrive sequentially.
We propose a dual selective SSM projector that dynamically adjusts the projection parameters based on the intermediate features for dynamic adaptation.
arXiv Detail & Related papers (2024-07-08T17:09:39Z) - B-LSTM-MIONet: Bayesian LSTM-based Neural Operators for Learning the
Response of Complex Dynamical Systems to Length-Variant Multiple Input
Functions [6.75867828529733]
Multiple-input deep neural operators (MIONet) extended DeepONet to allow multiple input functions in different Banach spaces.
MIONet offers flexibility in training dataset grid spacing, without constraints on output location.
This work redesigns MIONet, integrating Long Short Term Memory (LSTM) to learn neural operators from time-dependent data.
arXiv Detail & Related papers (2023-11-28T04:58:17Z) - Sparse Modular Activation for Efficient Sequence Modeling [94.11125833685583]
Recent models combining Linear State Space Models with self-attention mechanisms have demonstrated impressive results across a range of sequence modeling tasks.
Current approaches apply attention modules statically and uniformly to all elements in the input sequences, leading to sub-optimal quality-efficiency trade-offs.
We introduce Sparse Modular Activation (SMA), a general mechanism enabling neural networks to sparsely activate sub-modules for sequence elements in a differentiable manner.
arXiv Detail & Related papers (2023-06-19T23:10:02Z) - Image Classification using Sequence of Pixels [3.04585143845864]
This study compares sequential image classification methods based on recurrent neural networks.
We describe methods based on Long-Short-Term memory(LSTM), bidirectional Long-Short-Term memory(BiLSTM) architectures, etc.
arXiv Detail & Related papers (2022-09-23T09:42:44Z) - MM-TTA: Multi-Modal Test-Time Adaptation for 3D Semantic Segmentation [104.48766162008815]
We propose and explore a new multi-modal extension of test-time adaptation for 3D semantic segmentation.
To design a framework that can take full advantage of multi-modality, each modality provides regularized self-supervisory signals to other modalities.
Our regularized pseudo labels produce stable self-learning signals in numerous multi-modal test-time adaptation scenarios.
arXiv Detail & Related papers (2022-04-27T02:28:12Z) - Working Memory Connections for LSTM [51.742526187978726]
We show that Working Memory Connections constantly improve the performance of LSTMs on a variety of tasks.
Numerical results suggest that the cell state contains useful information that is worth including in the gate structure.
arXiv Detail & Related papers (2021-08-31T18:01:30Z) - Comparisons among different stochastic selection of activation layers
for convolutional neural networks for healthcare [77.99636165307996]
We classify biomedical images using ensembles of neural networks.
We select our activations among the following ones: ReLU, leaky ReLU, Parametric ReLU, ELU, Adaptive Piecewice Linear Unit, S-Shaped ReLU, Swish, Mish, Mexican Linear Unit, Parametric Deformable Linear Unit, Soft Root Sign.
arXiv Detail & Related papers (2020-11-24T01:53:39Z) - Alternating ConvLSTM: Learning Force Propagation with Alternate State
Updates [29.011464047344614]
We introduce the alternating convolutional Long Short-Term Memory (Alt-ConvLSTM) that models the force propagation mechanisms in a deformable object with near-uniform material properties.
We demonstrate how this novel scheme imitates the alternate updates of the first and second-order terms in the forward method of numerical PDE solvers.
We validate our Alt-ConvLSTM on human soft tissue simulation with thousands of particles and consistent body pose changes.
arXiv Detail & Related papers (2020-06-14T06:43:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.