Recognizing Long Grammatical Sequences Using Recurrent Networks
Augmented With An External Differentiable Stack
- URL: http://arxiv.org/abs/2004.07623v2
- Date: Wed, 22 Apr 2020 15:36:26 GMT
- Title: Recognizing Long Grammatical Sequences Using Recurrent Networks
Augmented With An External Differentiable Stack
- Authors: Ankur Mali, Alexander Ororbia, Daniel Kifer, Clyde Lee Giles
- Abstract summary: Recurrent neural networks (RNNs) are a widely used deep architecture for sequence modeling, generation, and prediction.
RNNs generalize poorly over very long sequences, which limits their applicability to many important temporal processing and time series forecasting problems.
One way to address these shortcomings is to couple an RNN with an external, differentiable memory structure, such as a stack.
In this paper, we improve the memory-augmented RNN with important architectural and state updating mechanisms.
- Score: 73.48927855855219
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recurrent neural networks (RNNs) are a widely used deep architecture for
sequence modeling, generation, and prediction. Despite success in applications
such as machine translation and voice recognition, these stateful models have
several critical shortcomings. Specifically, RNNs generalize poorly over very
long sequences, which limits their applicability to many important temporal
processing and time series forecasting problems. For example, RNNs struggle in
recognizing complex context free languages (CFLs), never reaching 100% accuracy
on training. One way to address these shortcomings is to couple an RNN with an
external, differentiable memory structure, such as a stack. However,
differentiable memories in prior work have neither been extensively studied on
CFLs nor tested on sequences longer than those seen in training. The few
efforts that have studied them have shown that continuous differentiable memory
structures yield poor generalization for complex CFLs, making the RNN less
interpretable. In this paper, we improve the memory-augmented RNN with
important architectural and state updating mechanisms that ensure that the
model learns to properly balance the use of its latent states with external
memory. Our improved RNN models exhibit better generalization performance and
are able to classify long strings generated by complex hierarchical context
free grammars (CFGs). We evaluate our models on CGGs, including the Dyck
languages, as well as on the Penn Treebank language modelling task, and achieve
stable, robust performance across these benchmarks. Furthermore, we show that
only our memory-augmented networks are capable of retaining memory for a longer
duration up to strings of length 160.
Related papers
- Training Neural Networks as Recognizers of Formal Languages [87.06906286950438]
Formal language theory pertains specifically to recognizers.
It is common to instead use proxy tasks that are similar in only an informal sense.
We correct this mismatch by training and evaluating neural networks directly as binary classifiers of strings.
arXiv Detail & Related papers (2024-11-11T16:33:25Z) - Stuffed Mamba: State Collapse and State Capacity of RNN-Based Long-Context Modeling [69.36377985746878]
We study the cause of the inability to process long context for RNNs and suggest critical mitigations.
We first investigate *state collapse* (SC), a phenomenon that causes severe performance degradation on sequence lengths not encountered during training.
We train a series of Mamba-2 models on long documents to empirically estimate the recurrent state capacity in language modeling and passkey retrieval.
arXiv Detail & Related papers (2024-10-09T17:54:28Z) - On the Computational Complexity and Formal Hierarchy of Second Order
Recurrent Neural Networks [59.85314067235965]
We extend the theoretical foundation for the $2nd$-order recurrent network ($2nd$ RNN)
We prove there exists a class of a $2nd$ RNN that is Turing-complete with bounded time.
We also demonstrate that $2$nd order RNNs, without memory, outperform modern-day models such as vanilla RNNs and gated recurrent units in recognizing regular grammars.
arXiv Detail & Related papers (2023-09-26T06:06:47Z) - SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks [21.616328837090396]
Spiking Neural Networks (SNNs) leverage sparse and event-driven activations to reduce the computational overhead associated with model inference.
We implement generative language model with binary, event-driven spiking activation units.
SpikeGPT is the largest backpropagation-trained SNN model to date, rendering it suitable for both the generation and comprehension of natural language.
arXiv Detail & Related papers (2023-02-27T16:43:04Z) - MS-RNN: A Flexible Multi-Scale Framework for Spatiotemporal Predictive
Learning [7.311071760653835]
We propose a general framework named Multi-Scale RNN (MS-RNN) to boost recent RNN models for predictive learning.
We verify the MS-RNN framework by thorough theoretical analyses and exhaustive experiments.
Results show the efficiency that RNN models incorporating our framework have much lower memory cost but better performance than before.
arXiv Detail & Related papers (2022-06-07T04:57:58Z) - Learning Hierarchical Structures with Differentiable Nondeterministic
Stacks [25.064819128982556]
We present a stack RNN model based on the recently proposed Nondeterministic Stack RNN (NS-RNN)
We show that the NS-RNN achieves lower cross-entropy than all previous stack RNNs on five context-free language modeling tasks.
We also propose a restricted version of the NS-RNN that makes it practical to use for language modeling on natural language.
arXiv Detail & Related papers (2021-09-05T03:25:23Z) - Deep Time Delay Neural Network for Speech Enhancement with Full Data
Learning [60.20150317299749]
This paper proposes a deep time delay neural network (TDNN) for speech enhancement with full data learning.
To make full use of the training data, we propose a full data learning method for speech enhancement.
arXiv Detail & Related papers (2020-11-11T06:32:37Z) - On the Practical Ability of Recurrent Neural Networks to Recognize
Hierarchical Languages [9.12267978757844]
We study the performance of recurrent models on Dyck-n languages.
We find that while recurrent models generalize nearly perfectly if the lengths of the training and test strings are from the same range, they perform poorly if the test strings are longer.
arXiv Detail & Related papers (2020-11-08T12:15:31Z) - Learning Context-Free Languages with Nondeterministic Stack RNNs [20.996069249108224]
We present a differentiable stack data structure that simultaneously and tractably encodes an exponential number of stack configurations.
We call the combination of this data structure with a recurrent neural network (RNN) controller a Nondeterministic Stack RNN.
arXiv Detail & Related papers (2020-10-09T16:48:41Z) - Incremental Training of a Recurrent Neural Network Exploiting a
Multi-Scale Dynamic Memory [79.42778415729475]
We propose a novel incrementally trained recurrent architecture targeting explicitly multi-scale learning.
We show how to extend the architecture of a simple RNN by separating its hidden state into different modules.
We discuss a training algorithm where new modules are iteratively added to the model to learn progressively longer dependencies.
arXiv Detail & Related papers (2020-06-29T08:35:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.