Hierarchically Gated Recurrent Neural Network for Sequence Modeling
- URL: http://arxiv.org/abs/2311.04823v1
- Date: Wed, 8 Nov 2023 16:50:05 GMT
- Title: Hierarchically Gated Recurrent Neural Network for Sequence Modeling
- Authors: Zhen Qin, Songlin Yang, Yiran Zhong
- Abstract summary: We propose a gated linear RNN model dubbed Hierarchically Gated Recurrent Neural Network (HGRN)
Experiments on language modeling, image classification, and long-range arena benchmarks showcase the efficiency and effectiveness of our proposed model.
- Score: 36.14544998133578
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformers have surpassed RNNs in popularity due to their superior
abilities in parallel training and long-term dependency modeling. Recently,
there has been a renewed interest in using linear RNNs for efficient sequence
modeling. These linear RNNs often employ gating mechanisms in the output of the
linear recurrence layer while ignoring the significance of using forget gates
within the recurrence. In this paper, we propose a gated linear RNN model
dubbed Hierarchically Gated Recurrent Neural Network (HGRN), which includes
forget gates that are lower bounded by a learnable value. The lower bound
increases monotonically when moving up layers. This allows the upper layers to
model long-term dependencies and the lower layers to model more local,
short-term dependencies. Experiments on language modeling, image
classification, and long-range arena benchmarks showcase the efficiency and
effectiveness of our proposed model. The source code is available at
https://github.com/OpenNLPLab/HGRN.
Related papers
- Were RNNs All We Needed? [53.393497486332]
We revisit traditional recurrent neural networks (RNNs) from over a decade ago.
We show that by removing their hidden state dependencies from their input, forget, and update gates, LSTMs and GRUs no longer need to BPTT and can be efficiently trained in parallel.
arXiv Detail & Related papers (2024-10-02T03:06:49Z) - RotRNN: Modelling Long Sequences with Rotations [7.037239398244858]
Linear recurrent neural networks, such as State Space Models (SSMs) and Linear Recurrent Units (LRUs) have recently shown state-of-the-art performance on long sequence modelling benchmarks.
We propose RotRNN -- a linear recurrent model which utilises the convenient properties of rotation matrices.
We show that RotRNN provides a simple and efficient model with a robust normalisation procedure, and a practical implementation that remains faithful to its theoretical derivation.
arXiv Detail & Related papers (2024-07-09T21:37:36Z) - MGNNI: Multiscale Graph Neural Networks with Implicit Layers [53.75421430520501]
implicit graph neural networks (GNNs) have been proposed to capture long-range dependencies in underlying graphs.
We introduce and justify two weaknesses of implicit GNNs: the constrained expressiveness due to their limited effective range for capturing long-range dependencies, and their lack of ability to capture multiscale information on graphs at multiple resolutions.
We propose a multiscale graph neural network with implicit layers (MGNNI) which is able to model multiscale structures on graphs and has an expanded effective range for capturing long-range dependencies.
arXiv Detail & Related papers (2022-10-15T18:18:55Z) - EIGNN: Efficient Infinite-Depth Graph Neural Networks [51.97361378423152]
Graph neural networks (GNNs) are widely used for modelling graph-structured data in numerous applications.
Motivated by this limitation, we propose a GNN model with infinite depth, which we call Efficient Infinite-Depth Graph Neural Networks (EIGNN)
We show that EIGNN has a better ability to capture long-range dependencies than recent baselines, and consistently achieves state-of-the-art performance.
arXiv Detail & Related papers (2022-02-22T08:16:58Z) - Auditory Attention Decoding from EEG using Convolutional Recurrent
Neural Network [20.37214453938965]
The auditory attention decoding (AAD) approach was proposed to determine the identity of the attended talker in a multi-talker scenario.
Recent models based on deep neural networks (DNN) have been proposed to solve this problem.
In this paper, we proposed novel convolutional recurrent neural network (CRNN) based regression model and classification model.
arXiv Detail & Related papers (2021-03-03T05:09:40Z) - Introducing the Hidden Neural Markov Chain framework [7.85426761612795]
This paper proposes the original Hidden Neural Markov Chain (HNMC) framework, a new family of sequential neural models.
We propose three different models: the classic HNMC, the HNMC2, and the HNMC-CN.
It shows this new neural sequential framework's potential, which can open the way to new models and might eventually compete with the prevalent BiLSTM and BiGRU.
arXiv Detail & Related papers (2021-02-17T20:13:45Z) - Overcoming Catastrophic Forgetting in Graph Neural Networks [50.900153089330175]
Catastrophic forgetting refers to the tendency that a neural network "forgets" the previous learned knowledge upon learning new tasks.
We propose a novel scheme dedicated to overcoming this problem and hence strengthen continual learning in graph neural networks (GNNs)
At the heart of our approach is a generic module, termed as topology-aware weight preserving(TWP)
arXiv Detail & Related papers (2020-12-10T22:30:25Z) - A Fully Tensorized Recurrent Neural Network [48.50376453324581]
We introduce a "fully tensorized" RNN architecture which jointly encodes the separate weight matrices within each recurrent cell.
This approach reduces model size by several orders of magnitude, while still maintaining similar or better performance compared to standard RNNs.
arXiv Detail & Related papers (2020-10-08T18:24:12Z) - Recurrent Graph Tensor Networks: A Low-Complexity Framework for
Modelling High-Dimensional Multi-Way Sequence [24.594587557319837]
We develop a graph filter framework for approximating the modelling of hidden states in Recurrent Neural Networks (RNNs)
The proposed framework is validated through several multi-way sequence modelling tasks and benchmarked against traditional RNNs.
We show that the proposed RGTN is capable of not only out-performing standard RNNs, but also mitigating the Curse of Dimensionality associated with traditional RNNs.
arXiv Detail & Related papers (2020-09-18T10:13:36Z) - Binarized Graph Neural Network [65.20589262811677]
We develop a binarized graph neural network to learn the binary representations of the nodes with binary network parameters.
Our proposed method can be seamlessly integrated into the existing GNN-based embedding approaches.
Experiments indicate that the proposed binarized graph neural network, namely BGN, is orders of magnitude more efficient in terms of both time and space.
arXiv Detail & Related papers (2020-04-19T09:43:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.