The Surprising Computational Power of Nondeterministic Stack RNNs
- URL: http://arxiv.org/abs/2210.01343v1
- Date: Tue, 4 Oct 2022 03:18:19 GMT
- Title: The Surprising Computational Power of Nondeterministic Stack RNNs
- Authors: Brian DuSell, David Chiang
- Abstract summary: Traditional recurrent neural networks (RNNs) have a fixed, finite number of memory cells.
In this paper, we show that nondeterminism and the neural controller interact to produce two more unexpected abilities.
First, the nondeterministic stack RNN can recognize not only CFLs, but also many non-context-free languages.
Second, it can recognize languages with much larger alphabet sizes than one might expect given the size of its stack alphabet.
- Score: 20.996069249108224
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Traditional recurrent neural networks (RNNs) have a fixed, finite number of
memory cells. In theory (assuming bounded range and precision), this limits
their formal language recognition power to regular languages, and in practice,
RNNs have been shown to be unable to learn many context-free languages (CFLs).
In order to expand the class of languages RNNs recognize, prior work has
augmented RNNs with a nondeterministic stack data structure, putting them on
par with pushdown automata and increasing their language recognition power to
CFLs. Nondeterminism is needed for recognizing all CFLs (not just deterministic
CFLs), but in this paper, we show that nondeterminism and the neural controller
interact to produce two more unexpected abilities. First, the nondeterministic
stack RNN can recognize not only CFLs, but also many non-context-free
languages. Second, it can recognize languages with much larger alphabet sizes
than one might expect given the size of its stack alphabet. Finally, to
increase the information capacity in the stack and allow it to solve more
complicated tasks with large alphabet sizes, we propose a new version of the
nondeterministic stack that simulates stacks of vectors rather than discrete
symbols. We demonstrate perplexity improvements with this new model on the Penn
Treebank language modeling benchmark.
Related papers
- What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages [78.1866280652834]
Large language models (LM) are distributions over strings.
We investigate the learnability of regular LMs (RLMs) by RNN and Transformer LMs.
We find that the complexity of the RLM rank is strong and significant predictors of learnability for both RNNs and Transformers.
arXiv Detail & Related papers (2024-06-06T17:34:24Z) - In-Context Language Learning: Architectures and Algorithms [73.93205821154605]
We study ICL through the lens of a new family of model problems we term in context language learning (ICLL)
We evaluate a diverse set of neural sequence models on regular ICLL tasks.
arXiv Detail & Related papers (2024-01-23T18:59:21Z) - On the Computational Complexity and Formal Hierarchy of Second Order
Recurrent Neural Networks [59.85314067235965]
We extend the theoretical foundation for the $2nd$-order recurrent network ($2nd$ RNN)
We prove there exists a class of a $2nd$ RNN that is Turing-complete with bounded time.
We also demonstrate that $2$nd order RNNs, without memory, outperform modern-day models such as vanilla RNNs and gated recurrent units in recognizing regular grammars.
arXiv Detail & Related papers (2023-09-26T06:06:47Z) - Advancing Regular Language Reasoning in Linear Recurrent Neural Networks [56.11830645258106]
We study whether linear recurrent neural networks (LRNNs) can learn the hidden rules in training sequences.
We propose a new LRNN equipped with a block-diagonal and input-dependent transition matrix.
Experiments suggest that the proposed model is the only LRNN capable of performing length extrapolation on regular language tasks.
arXiv Detail & Related papers (2023-09-14T03:36:01Z) - Nondeterministic Stacks in Neural Networks [0.456877715768796]
We develop a differentiable data structure that efficiently simulates a nondeterministic pushdown automaton.
We show that this raises their formal recognition power to arbitrary context-free languages.
We also show that an RNN augmented with a nondeterministic stack is capable of surprisingly powerful behavior.
arXiv Detail & Related papers (2023-04-25T16:00:40Z) - Learning Hierarchical Structures with Differentiable Nondeterministic
Stacks [25.064819128982556]
We present a stack RNN model based on the recently proposed Nondeterministic Stack RNN (NS-RNN)
We show that the NS-RNN achieves lower cross-entropy than all previous stack RNNs on five context-free language modeling tasks.
We also propose a restricted version of the NS-RNN that makes it practical to use for language modeling on natural language.
arXiv Detail & Related papers (2021-09-05T03:25:23Z) - RNNs can generate bounded hierarchical languages with optimal memory [113.73133308478612]
We show that RNNs can efficiently generate bounded hierarchical languages that reflect the scaffolding of natural language syntax.
We introduce Dyck-($k$,$m$), the language of well-nested brackets (of $k$ types) and $m$-bounded nesting depth.
We prove that an RNN with $O(m log k)$ hidden units suffices, an exponential reduction in memory, by an explicit construction.
arXiv Detail & Related papers (2020-10-15T04:42:29Z) - Learning Context-Free Languages with Nondeterministic Stack RNNs [20.996069249108224]
We present a differentiable stack data structure that simultaneously and tractably encodes an exponential number of stack configurations.
We call the combination of this data structure with a recurrent neural network (RNN) controller a Nondeterministic Stack RNN.
arXiv Detail & Related papers (2020-10-09T16:48:41Z) - Recognizing Long Grammatical Sequences Using Recurrent Networks
Augmented With An External Differentiable Stack [73.48927855855219]
Recurrent neural networks (RNNs) are a widely used deep architecture for sequence modeling, generation, and prediction.
RNNs generalize poorly over very long sequences, which limits their applicability to many important temporal processing and time series forecasting problems.
One way to address these shortcomings is to couple an RNN with an external, differentiable memory structure, such as a stack.
In this paper, we improve the memory-augmented RNN with important architectural and state updating mechanisms.
arXiv Detail & Related papers (2020-04-04T14:19:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.