A Formal Hierarchy of RNN Architectures
- URL: http://arxiv.org/abs/2004.08500v4
- Date: Sat, 19 Sep 2020 23:03:45 GMT
- Title: A Formal Hierarchy of RNN Architectures
- Authors: William Merrill and Gail Weiss and Yoav Goldberg and Roy Schwartz and
Noah A. Smith and Eran Yahav
- Abstract summary: hierarchy is based on two formal properties: space, which measures the RNN's memory, and rational recurrence, defined as whether the recurrent update can be described by a weighted finite-state machine.
We show how these models' expressive capacity is expanded by stacking multiple layers or composing them with different pooling functions.
We hypothesize that the practical learnable capacity of unsaturated RNNs obeys a similar hierarchy.
- Score: 88.38859874233944
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We develop a formal hierarchy of the expressive capacity of RNN
architectures. The hierarchy is based on two formal properties: space
complexity, which measures the RNN's memory, and rational recurrence, defined
as whether the recurrent update can be described by a weighted finite-state
machine. We place several RNN variants within this hierarchy. For example, we
prove the LSTM is not rational, which formally separates it from the related
QRNN (Bradbury et al., 2016). We also show how these models' expressive
capacity is expanded by stacking multiple layers or composing them with
different pooling functions. Our results build on the theory of "saturated"
RNNs (Merrill, 2019). While formally extending these findings to unsaturated
RNNs is left to future work, we hypothesize that the practical learnable
capacity of unsaturated RNNs obeys a similar hierarchy. Experimental findings
from training unsaturated networks on formal languages support this conjecture.
Related papers
- Extensional Properties of Recurrent Neural Networks [49.30491917300904]
A property of a recurrent neural network (RNN) is called emphextensional if, loosely speaking, it is a property of the function computed by the RNN rather than a property of the RNN algorithm.
We prove a version of Rice's theorem for RNNs: any nontrivial extensional property of RNNs is undecidable.
arXiv Detail & Related papers (2024-10-30T06:29:02Z) - A Tensor Decomposition Perspective on Second-order RNNs [5.922280687190788]
We study the model resulting from parameterizing 2RNNs using the CP decomposition, which we call CPRNN.
We analyze how rank and hidden size affect model capacity and show the relationships between RNNs, 2RNNs, MIRNNs, and CPRNNs based on these parameters.
We support these results empirically with experiments on the Penn Treebank dataset which demonstrate that, with a fixed parameter budget, CPRNNs outperforms RNNs, 2RNNs, and MIRNNs with the right choice of rank and hidden size.
arXiv Detail & Related papers (2024-06-07T16:18:32Z) - On Efficiently Representing Regular Languages as RNNs [49.88310438099143]
We show that RNNs can efficiently represent bounded hierarchical structures that are prevalent in human language.
This suggests that RNNs' success might be linked to their ability to model hierarchy.
We show that RNNs can efficiently represent a larger class of LMs than previously claimed.
arXiv Detail & Related papers (2024-02-24T13:42:06Z) - Recurrent Neural Language Models as Probabilistic Finite-state Automata [66.23172872811594]
We study what classes of probability distributions RNN LMs can represent.
We show that simple RNNs are equivalent to a subclass of probabilistic finite-state automata.
These results present a first step towards characterizing the classes of distributions RNN LMs can represent.
arXiv Detail & Related papers (2023-10-08T13:36:05Z) - On the Computational Complexity and Formal Hierarchy of Second Order
Recurrent Neural Networks [59.85314067235965]
We extend the theoretical foundation for the $2nd$-order recurrent network ($2nd$ RNN)
We prove there exists a class of a $2nd$ RNN that is Turing-complete with bounded time.
We also demonstrate that $2$nd order RNNs, without memory, outperform modern-day models such as vanilla RNNs and gated recurrent units in recognizing regular grammars.
arXiv Detail & Related papers (2023-09-26T06:06:47Z) - Learning Transductions and Alignments with RNN Seq2seq Models [0.8158530638728501]
We study the capabilities of Recurrent-Neural-Network sequence to sequence (RNN seq2seq) models in learning four transduction tasks.
We find that RNN seq2seq models are only able to approximate a mapping that fits the training or in-distribution data, instead of learning the underlying functions.
arXiv Detail & Related papers (2023-03-13T04:15:33Z) - Decomposing a Recurrent Neural Network into Modules for Enabling
Reusability and Replacement [11.591247347259317]
We propose the first approach to decompose an RNN into modules.
We study different types of RNNs, i.e., Vanilla, LSTM, and GRU.
We show how such RNN modules can be reused and replaced in various scenarios.
arXiv Detail & Related papers (2022-12-09T03:29:38Z) - Architecture Disentanglement for Deep Neural Networks [174.16176919145377]
We introduce neural architecture disentanglement (NAD) to explain the inner workings of deep neural networks (DNNs)
NAD learns to disentangle a pre-trained DNN into sub-architectures according to independent tasks, forming information flows that describe the inference processes.
Results show that misclassified images have a high probability of being assigned to task sub-architectures similar to the correct ones.
arXiv Detail & Related papers (2020-03-30T08:34:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.