Learning Hierarchical Structures with Differentiable Nondeterministic
Stacks
- URL: http://arxiv.org/abs/2109.01982v1
- Date: Sun, 5 Sep 2021 03:25:23 GMT
- Title: Learning Hierarchical Structures with Differentiable Nondeterministic
Stacks
- Authors: Brian DuSell and David Chiang
- Abstract summary: We present a stack RNN model based on the recently proposed Nondeterministic Stack RNN (NS-RNN)
We show that the NS-RNN achieves lower cross-entropy than all previous stack RNNs on five context-free language modeling tasks.
We also propose a restricted version of the NS-RNN that makes it practical to use for language modeling on natural language.
- Score: 25.064819128982556
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning hierarchical structures in sequential data -- from simple
algorithmic patterns to natural language -- in a reliable, generalizable way
remains a challenging problem for neural language models. Past work has shown
that recurrent neural networks (RNNs) struggle to generalize on held-out
algorithmic or syntactic patterns without supervision or some inductive bias.
To remedy this, many papers have explored augmenting RNNs with various
differentiable stacks, by analogy with finite automata and pushdown automata.
In this paper, we present a stack RNN model based on the recently proposed
Nondeterministic Stack RNN (NS-RNN) that achieves lower cross-entropy than all
previous stack RNNs on five context-free language modeling tasks (within 0.05
nats of the information-theoretic lower bound), including a task in which the
NS-RNN previously failed to outperform a deterministic stack RNN baseline. Our
model assigns arbitrary positive weights instead of probabilities to stack
actions, and we provide an analysis of why this improves training. We also
propose a restricted version of the NS-RNN that makes it practical to use for
language modeling on natural language and present results on the Penn Treebank
corpus.
Related papers
- Training Neural Networks as Recognizers of Formal Languages [87.06906286950438]
Formal language theory pertains specifically to recognizers.
It is common to instead use proxy tasks that are similar in only an informal sense.
We correct this mismatch by training and evaluating neural networks directly as binary classifiers of strings.
arXiv Detail & Related papers (2024-11-11T16:33:25Z) - On Efficiently Representing Regular Languages as RNNs [49.88310438099143]
We show that RNNs can efficiently represent bounded hierarchical structures that are prevalent in human language.
This suggests that RNNs' success might be linked to their ability to model hierarchy.
We show that RNNs can efficiently represent a larger class of LMs than previously claimed.
arXiv Detail & Related papers (2024-02-24T13:42:06Z) - On the Computational Complexity and Formal Hierarchy of Second Order
Recurrent Neural Networks [59.85314067235965]
We extend the theoretical foundation for the $2nd$-order recurrent network ($2nd$ RNN)
We prove there exists a class of a $2nd$ RNN that is Turing-complete with bounded time.
We also demonstrate that $2$nd order RNNs, without memory, outperform modern-day models such as vanilla RNNs and gated recurrent units in recognizing regular grammars.
arXiv Detail & Related papers (2023-09-26T06:06:47Z) - Advancing Regular Language Reasoning in Linear Recurrent Neural Networks [56.11830645258106]
We study whether linear recurrent neural networks (LRNNs) can learn the hidden rules in training sequences.
We propose a new LRNN equipped with a block-diagonal and input-dependent transition matrix.
Experiments suggest that the proposed model is the only LRNN capable of performing length extrapolation on regular language tasks.
arXiv Detail & Related papers (2023-09-14T03:36:01Z) - Nondeterministic Stacks in Neural Networks [0.456877715768796]
We develop a differentiable data structure that efficiently simulates a nondeterministic pushdown automaton.
We show that this raises their formal recognition power to arbitrary context-free languages.
We also show that an RNN augmented with a nondeterministic stack is capable of surprisingly powerful behavior.
arXiv Detail & Related papers (2023-04-25T16:00:40Z) - Decomposing a Recurrent Neural Network into Modules for Enabling
Reusability and Replacement [11.591247347259317]
We propose the first approach to decompose an RNN into modules.
We study different types of RNNs, i.e., Vanilla, LSTM, and GRU.
We show how such RNN modules can be reused and replaced in various scenarios.
arXiv Detail & Related papers (2022-12-09T03:29:38Z) - Rethinking Nearest Neighbors for Visual Classification [56.00783095670361]
k-NN is a lazy learning method that aggregates the distance between the test image and top-k neighbors in a training set.
We adopt k-NN with pre-trained visual representations produced by either supervised or self-supervised methods in two steps.
Via extensive experiments on a wide range of classification tasks, our study reveals the generality and flexibility of k-NN integration.
arXiv Detail & Related papers (2021-12-15T20:15:01Z) - Can RNNs learn Recursive Nested Subject-Verb Agreements? [4.094098809740732]
Language processing requires the ability to extract nested tree structures.
Recent advances in Recurrent Neural Networks (RNNs) achieve near-human performance in some language tasks.
arXiv Detail & Related papers (2021-01-06T20:47:02Z) - Learning Context-Free Languages with Nondeterministic Stack RNNs [20.996069249108224]
We present a differentiable stack data structure that simultaneously and tractably encodes an exponential number of stack configurations.
We call the combination of this data structure with a recurrent neural network (RNN) controller a Nondeterministic Stack RNN.
arXiv Detail & Related papers (2020-10-09T16:48:41Z) - Recognizing Long Grammatical Sequences Using Recurrent Networks
Augmented With An External Differentiable Stack [73.48927855855219]
Recurrent neural networks (RNNs) are a widely used deep architecture for sequence modeling, generation, and prediction.
RNNs generalize poorly over very long sequences, which limits their applicability to many important temporal processing and time series forecasting problems.
One way to address these shortcomings is to couple an RNN with an external, differentiable memory structure, such as a stack.
In this paper, we improve the memory-augmented RNN with important architectural and state updating mechanisms.
arXiv Detail & Related papers (2020-04-04T14:19:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.