Related papers: RNNs can generate bounded hierarchical languages with optimal memory

RNNs can generate bounded hierarchical languages with optimal memory

URL: http://arxiv.org/abs/2010.07515v1
Date: Thu, 15 Oct 2020 04:42:29 GMT
Title: RNNs can generate bounded hierarchical languages with optimal memory
Authors: John Hewitt, Michael Hahn, Surya Ganguli, Percy Liang, Christopher D. Manning
Abstract summary: We show that RNNs can efficiently generate bounded hierarchical languages that reflect the scaffolding of natural language syntax. We introduce Dyck-($k$,$m$), the language of well-nested brackets (of $k$ types) and $m$-bounded nesting depth. We prove that an RNN with $O(m log k)$ hidden units suffices, an exponential reduction in memory, by an explicit construction.
Score: 113.73133308478612
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recurrent neural networks empirically generate natural language with high syntactic fidelity. However, their success is not well-understood theoretically. We provide theoretical insight into this success, proving in a finite-precision setting that RNNs can efficiently generate bounded hierarchical languages that reflect the scaffolding of natural language syntax. We introduce Dyck-($k$,$m$), the language of well-nested brackets (of $k$ types) and $m$-bounded nesting depth, reflecting the bounded memory needs and long-distance dependencies of natural language syntax. The best known results use $O(k^{\frac{m}{2}})$ memory (hidden units) to generate these languages. We prove that an RNN with $O(m \log k)$ hidden units suffices, an exponential reduction in memory, by an explicit construction. Finally, we show that no algorithm, even with unbounded computation, can suffice with $o(m \log k)$ hidden units.

Related papers

Information Locality as an Inductive Bias for Neural Language Models [52.92279412466086]
We show that $m$local entropy are difficult for Transformer and LSTM LMs to learn languages.<n>These results suggest that neurals are highly sensitive to the statistical structure of a language.
arXiv Detail & Related papers (2025-06-05T15:21:05Z)
Training Neural Networks as Recognizers of Formal Languages [87.06906286950438]
Formal language theory pertains specifically to recognizers. It is common to instead use proxy tasks that are similar in only an informal sense. We correct this mismatch by training and evaluating neural networks directly as binary classifiers of strings.
arXiv Detail & Related papers (2024-11-11T16:33:25Z)
On the Computational Complexity and Formal Hierarchy of Second Order Recurrent Neural Networks [59.85314067235965]
We extend the theoretical foundation for the $2nd$-order recurrent network ($2nd$ RNN) We prove there exists a class of a $2nd$ RNN that is Turing-complete with bounded time. We also demonstrate that $2$nd order RNNs, without memory, outperform modern-day models such as vanilla RNNs and gated recurrent units in recognizing regular grammars.
arXiv Detail & Related papers (2023-09-26T06:06:47Z)
Densely Connected $G$-invariant Deep Neural Networks with Signed Permutation Representations [6.200483285433661]
We introduce and investigate, for finite groups $G$, $G$-invariant deep neural network ($G$-DNN) architectures with ReLU activation. The preactivations of the $G$-DNNs are able to transform by emphsigned permutation representations (signed perm-reps) of $G$. We show that there are far more admissible $G$-DNN architectures than those accessible with the concatenated ReLU'' activation function from the literature.
arXiv Detail & Related papers (2023-03-08T14:35:03Z)
Residual Learning of Neural Text Generation with $n$-gram Language Model [41.26228768053928]
We learn a neural LM that fits the residual between an $n$-gram LM and the real-data distribution. Our approach attains additional performance gains over popular standalone neural models consistently.
arXiv Detail & Related papers (2022-10-26T02:42:53Z)
The Surprising Computational Power of Nondeterministic Stack RNNs [20.996069249108224]
Traditional recurrent neural networks (RNNs) have a fixed, finite number of memory cells. In this paper, we show that nondeterminism and the neural controller interact to produce two more unexpected abilities. First, the nondeterministic stack RNN can recognize not only CFLs, but also many non-context-free languages. Second, it can recognize languages with much larger alphabet sizes than one might expect given the size of its stack alphabet.
arXiv Detail & Related papers (2022-10-04T03:18:19Z)
Quantum Resources Required to Block-Encode a Matrix of Classical Data [56.508135743727934]
We provide circuit-level implementations and resource estimates for several methods of block-encoding a dense $Ntimes N$ matrix of classical data to precision $epsilon$. We examine resource tradeoffs between the different approaches and explore implementations of two separate models of quantum random access memory (QRAM) Our results go beyond simple query complexity and provide a clear picture into the resource costs when large amounts of classical data are assumed to be accessible to quantum algorithms.
arXiv Detail & Related papers (2022-06-07T18:00:01Z)
On the Optimal Memorization Power of ReLU Neural Networks [53.15475693468925]
We show that feedforward ReLU neural networks can memorization any $N$ points that satisfy a mild separability assumption. We prove that having such a large bit complexity is both necessary and sufficient for memorization with a sub-linear number of parameters.
arXiv Detail & Related papers (2021-10-07T05:25:23Z)
Self-Attention Networks Can Process Bounded Hierarchical Languages [24.75432474021856]
We prove that self-attention networks can process $mathsfDyck_k, D$, a subset of $mathsfDyck_k, D$ with depth bounded by $D$. Experiments show that self-attention networks trained on $mathsfDyck_k, D$ generalize to longer inputs with near-perfect accuracy.
arXiv Detail & Related papers (2021-05-24T06:42:58Z)
Recursive Top-Down Production for Sentence Generation with Latent Trees [77.56794870399288]
We model the production property of context-free grammars for natural and synthetic languages. We present a dynamic programming algorithm that marginalises over latent binary tree structures with $N$ leaves. We also present experimental results on German-English translation on the Multi30k dataset.
arXiv Detail & Related papers (2020-10-09T17:47:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.