Related papers: On Efficiently Representing Regular Languages as RNNs

On Efficiently Representing Regular Languages as RNNs

URL: http://arxiv.org/abs/2402.15814v2
Date: Tue, 18 Jun 2024 15:14:18 GMT
Title: On Efficiently Representing Regular Languages as RNNs
Authors: Anej Svete, Robin Shing Moon Chan, Ryan Cotterell,
Abstract summary: We show that RNNs can efficiently represent bounded hierarchical structures that are prevalent in human language. This suggests that RNNs' success might be linked to their ability to model hierarchy. We show that RNNs can efficiently represent a larger class of LMs than previously claimed.
Score: 49.88310438099143
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent work by Hewitt et al. (2020) provides an interpretation of the empirical success of recurrent neural networks (RNNs) as language models (LMs). It shows that RNNs can efficiently represent bounded hierarchical structures that are prevalent in human language. This suggests that RNNs' success might be linked to their ability to model hierarchy. However, a closer inspection of Hewitt et al.'s (2020) construction shows that it is not inherently limited to hierarchical structures. This poses a natural question: What other classes of LMs can RNNs efficiently represent? To this end, we generalize Hewitt et al.'s (2020) construction and show that RNNs can efficiently represent a larger class of LMs than previously claimed -- specifically, those that can be represented by a pushdown automaton with a bounded stack and a specific stack update function. Altogether, the efficiency of representing this diverse class of LMs with RNN LMs suggests novel interpretations of their inductive bias.

Related papers

Training Neural Networks as Recognizers of Formal Languages [87.06906286950438]
We train and evaluate neural networks directly as binary classifiers of strings. We provide results on a variety of languages across the Chomsky hierarchy for three neural architectures. Our contributions will facilitate theoretically sound empirical testing of language recognition claims in future work.
arXiv Detail & Related papers (2024-11-11T16:33:25Z)
Learning Useful Representations of Recurrent Neural Network Weight Matrices [30.583752432727326]
Recurrent Neural Networks (RNNs) are general-purpose parallel-sequential computers. How to learn useful representations of RNN weights that facilitate RNN analysis as well as downstream tasks? We consider several mechanistic approaches for RNN weights and adapt the permutation equivariant Deep Weight Space layer for RNNs. Our two novel functionalist approaches extract information from RNN weights by 'interrogating' the RNN through probing inputs.
arXiv Detail & Related papers (2024-03-18T17:32:23Z)
Recurrent Neural Language Models as Probabilistic Finite-state Automata [66.23172872811594]
We study what classes of probability distributions RNN LMs can represent. We show that simple RNNs are equivalent to a subclass of probabilistic finite-state automata. These results present a first step towards characterizing the classes of distributions RNN LMs can represent.
arXiv Detail & Related papers (2023-10-08T13:36:05Z)
On the Computational Complexity and Formal Hierarchy of Second Order Recurrent Neural Networks [59.85314067235965]
We extend the theoretical foundation for the $2nd$-order recurrent network ($2nd$ RNN) We prove there exists a class of a $2nd$ RNN that is Turing-complete with bounded time. We also demonstrate that $2$nd order RNNs, without memory, outperform modern-day models such as vanilla RNNs and gated recurrent units in recognizing regular grammars.
arXiv Detail & Related papers (2023-09-26T06:06:47Z)
Advancing Regular Language Reasoning in Linear Recurrent Neural Networks [56.11830645258106]
We study whether linear recurrent neural networks (LRNNs) can learn the hidden rules in training sequences. We propose a new LRNN equipped with a block-diagonal and input-dependent transition matrix. Experiments suggest that the proposed model is the only LRNN capable of performing length extrapolation on regular language tasks.
arXiv Detail & Related papers (2023-09-14T03:36:01Z)
Graph Neural Networks are Inherently Good Generalizers: Insights by Bridging GNNs and MLPs [71.93227401463199]
This paper pinpoints the major source of GNNs' performance gain to their intrinsic capability, by introducing an intermediate model class dubbed as P(ropagational)MLP. We observe that PMLPs consistently perform on par with (or even exceed) their GNN counterparts, while being much more efficient in training.
arXiv Detail & Related papers (2022-12-18T08:17:32Z)
Learning Hierarchical Structures with Differentiable Nondeterministic Stacks [25.064819128982556]
We present a stack RNN model based on the recently proposed Nondeterministic Stack RNN (NS-RNN) We show that the NS-RNN achieves lower cross-entropy than all previous stack RNNs on five context-free language modeling tasks. We also propose a restricted version of the NS-RNN that makes it practical to use for language modeling on natural language.
arXiv Detail & Related papers (2021-09-05T03:25:23Z)
A Formal Hierarchy of RNN Architectures [88.38859874233944]
hierarchy is based on two formal properties: space, which measures the RNN's memory, and rational recurrence, defined as whether the recurrent update can be described by a weighted finite-state machine. We show how these models' expressive capacity is expanded by stacking multiple layers or composing them with different pooling functions. We hypothesize that the practical learnable capacity of unsaturated RNNs obeys a similar hierarchy.
arXiv Detail & Related papers (2020-04-18T00:57:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.