RNNs can generate bounded hierarchical languages with optimal memory
- URL: http://arxiv.org/abs/2010.07515v1
- Date: Thu, 15 Oct 2020 04:42:29 GMT
- Title: RNNs can generate bounded hierarchical languages with optimal memory
- Authors: John Hewitt, Michael Hahn, Surya Ganguli, Percy Liang, Christopher D.
Manning
- Abstract summary: We show that RNNs can efficiently generate bounded hierarchical languages that reflect the scaffolding of natural language syntax.
We introduce Dyck-($k$,$m$), the language of well-nested brackets (of $k$ types) and $m$-bounded nesting depth.
We prove that an RNN with $O(m log k)$ hidden units suffices, an exponential reduction in memory, by an explicit construction.
- Score: 113.73133308478612
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recurrent neural networks empirically generate natural language with high
syntactic fidelity. However, their success is not well-understood
theoretically. We provide theoretical insight into this success, proving in a
finite-precision setting that RNNs can efficiently generate bounded
hierarchical languages that reflect the scaffolding of natural language syntax.
We introduce Dyck-($k$,$m$), the language of well-nested brackets (of $k$
types) and $m$-bounded nesting depth, reflecting the bounded memory needs and
long-distance dependencies of natural language syntax. The best known results
use $O(k^{\frac{m}{2}})$ memory (hidden units) to generate these languages. We
prove that an RNN with $O(m \log k)$ hidden units suffices, an exponential
reduction in memory, by an explicit construction. Finally, we show that no
algorithm, even with unbounded computation, can suffice with $o(m \log k)$
hidden units.
Related papers
- Training Neural Networks as Recognizers of Formal Languages [87.06906286950438]
Formal language theory pertains specifically to recognizers.
It is common to instead use proxy tasks that are similar in only an informal sense.
We correct this mismatch by training and evaluating neural networks directly as binary classifiers of strings.
arXiv Detail & Related papers (2024-11-11T16:33:25Z) - On the Computational Complexity and Formal Hierarchy of Second Order
Recurrent Neural Networks [59.85314067235965]
We extend the theoretical foundation for the $2nd$-order recurrent network ($2nd$ RNN)
We prove there exists a class of a $2nd$ RNN that is Turing-complete with bounded time.
We also demonstrate that $2$nd order RNNs, without memory, outperform modern-day models such as vanilla RNNs and gated recurrent units in recognizing regular grammars.
arXiv Detail & Related papers (2023-09-26T06:06:47Z) - Densely Connected $G$-invariant Deep Neural Networks with Signed
Permutation Representations [6.200483285433661]
We introduce and investigate, for finite groups $G$, $G$-invariant deep neural network ($G$-DNN) architectures with ReLU activation.
The preactivations of the $G$-DNNs are able to transform by emphsigned permutation representations (signed perm-reps) of $G$.
We show that there are far more admissible $G$-DNN architectures than those accessible with the concatenated ReLU'' activation function from the literature.
arXiv Detail & Related papers (2023-03-08T14:35:03Z) - Residual Learning of Neural Text Generation with $n$-gram Language Model [41.26228768053928]
We learn a neural LM that fits the residual between an $n$-gram LM and the real-data distribution.
Our approach attains additional performance gains over popular standalone neural models consistently.
arXiv Detail & Related papers (2022-10-26T02:42:53Z) - The Surprising Computational Power of Nondeterministic Stack RNNs [20.996069249108224]
Traditional recurrent neural networks (RNNs) have a fixed, finite number of memory cells.
In this paper, we show that nondeterminism and the neural controller interact to produce two more unexpected abilities.
First, the nondeterministic stack RNN can recognize not only CFLs, but also many non-context-free languages.
Second, it can recognize languages with much larger alphabet sizes than one might expect given the size of its stack alphabet.
arXiv Detail & Related papers (2022-10-04T03:18:19Z) - Quantum Resources Required to Block-Encode a Matrix of Classical Data [56.508135743727934]
We provide circuit-level implementations and resource estimates for several methods of block-encoding a dense $Ntimes N$ matrix of classical data to precision $epsilon$.
We examine resource tradeoffs between the different approaches and explore implementations of two separate models of quantum random access memory (QRAM)
Our results go beyond simple query complexity and provide a clear picture into the resource costs when large amounts of classical data are assumed to be accessible to quantum algorithms.
arXiv Detail & Related papers (2022-06-07T18:00:01Z) - On the Optimal Memorization Power of ReLU Neural Networks [53.15475693468925]
We show that feedforward ReLU neural networks can memorization any $N$ points that satisfy a mild separability assumption.
We prove that having such a large bit complexity is both necessary and sufficient for memorization with a sub-linear number of parameters.
arXiv Detail & Related papers (2021-10-07T05:25:23Z) - Self-Attention Networks Can Process Bounded Hierarchical Languages [24.75432474021856]
We prove that self-attention networks can process $mathsfDyck_k, D$, a subset of $mathsfDyck_k, D$ with depth bounded by $D$.
Experiments show that self-attention networks trained on $mathsfDyck_k, D$ generalize to longer inputs with near-perfect accuracy.
arXiv Detail & Related papers (2021-05-24T06:42:58Z) - Recursive Top-Down Production for Sentence Generation with Latent Trees [77.56794870399288]
We model the production property of context-free grammars for natural and synthetic languages.
We present a dynamic programming algorithm that marginalises over latent binary tree structures with $N$ leaves.
We also present experimental results on German-English translation on the Multi30k dataset.
arXiv Detail & Related papers (2020-10-09T17:47:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.