On the Computational Complexity and Formal Hierarchy of Second Order
Recurrent Neural Networks
- URL: http://arxiv.org/abs/2309.14691v1
- Date: Tue, 26 Sep 2023 06:06:47 GMT
- Title: On the Computational Complexity and Formal Hierarchy of Second Order
Recurrent Neural Networks
- Authors: Ankur Mali and Alexander Ororbia and Daniel Kifer and Lee Giles
- Abstract summary: We extend the theoretical foundation for the $2nd$-order recurrent network ($2nd$ RNN)
We prove there exists a class of a $2nd$ RNN that is Turing-complete with bounded time.
We also demonstrate that $2$nd order RNNs, without memory, outperform modern-day models such as vanilla RNNs and gated recurrent units in recognizing regular grammars.
- Score: 59.85314067235965
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Artificial neural networks (ANNs) with recurrence and self-attention have
been shown to be Turing-complete (TC). However, existing work has shown that
these ANNs require multiple turns or unbounded computation time, even with
unbounded precision in weights, in order to recognize TC grammars. However,
under constraints such as fixed or bounded precision neurons and time, ANNs
without memory are shown to struggle to recognize even context-free languages.
In this work, we extend the theoretical foundation for the $2^{nd}$-order
recurrent network ($2^{nd}$ RNN) and prove there exists a class of a $2^{nd}$
RNN that is Turing-complete with bounded time. This model is capable of
directly encoding a transition table into its recurrent weights, enabling
bounded time computation and is interpretable by design. We also demonstrate
that $2$nd order RNNs, without memory, under bounded weights and time
constraints, outperform modern-day models such as vanilla RNNs and gated
recurrent units in recognizing regular grammars. We provide an upper bound and
a stability analysis on the maximum number of neurons required by $2$nd order
RNNs to recognize any class of regular grammar. Extensive experiments on the
Tomita grammars support our findings, demonstrating the importance of tensor
connections in crafting computationally efficient RNNs. Finally, we show
$2^{nd}$ order RNNs are also interpretable by extraction and can extract state
machines with higher success rates as compared to first-order RNNs. Our results
extend the theoretical foundations of RNNs and offer promising avenues for
future explainable AI research.
Related papers
- Stuffed Mamba: State Collapse and State Capacity of RNN-Based Long-Context Modeling [69.36377985746878]
We study the cause of the inability to process long context for RNNs and suggest critical mitigations.
We first investigate *state collapse* (SC), a phenomenon that causes severe performance degradation on sequence lengths not encountered during training.
We train a series of Mamba-2 models on long documents to empirically estimate the recurrent state capacity in language modeling and passkey retrieval.
arXiv Detail & Related papers (2024-10-09T17:54:28Z) - Obtaining Optimal Spiking Neural Network in Sequence Learning via CRNN-SNN Conversion [12.893883491781697]
Spiking neural networks (SNNs) are a promising alternative to conventional artificial neural networks (ANNs)
We design two sub-pipelines to support the end-to-end conversion of different structures in neural networks.
We show the effectiveness of our method over short and long timescales compared with the state-of-the-art learning- and conversion-based methods.
arXiv Detail & Related papers (2024-08-18T08:23:51Z) - A Tensor Decomposition Perspective on Second-order RNNs [5.922280687190788]
We study the model resulting from parameterizing 2RNNs using the CP decomposition, which we call CPRNN.
We analyze how rank and hidden size affect model capacity and show the relationships between RNNs, 2RNNs, MIRNNs, and CPRNNs based on these parameters.
We support these results empirically with experiments on the Penn Treebank dataset which demonstrate that, with a fixed parameter budget, CPRNNs outperforms RNNs, 2RNNs, and MIRNNs with the right choice of rank and hidden size.
arXiv Detail & Related papers (2024-06-07T16:18:32Z) - Learning Useful Representations of Recurrent Neural Network Weight Matrices [30.583752432727326]
Recurrent Neural Networks (RNNs) are general-purpose parallel-sequential computers.
How to learn useful representations of RNN weights that facilitate RNN analysis as well as downstream tasks?
We consider several mechanistic approaches for RNN weights and adapt the permutation equivariant Deep Weight Space layer for RNNs.
Our two novel functionalist approaches extract information from RNN weights by 'interrogating' the RNN through probing inputs.
arXiv Detail & Related papers (2024-03-18T17:32:23Z) - Recurrent Neural Language Models as Probabilistic Finite-state Automata [66.23172872811594]
We study what classes of probability distributions RNN LMs can represent.
We show that simple RNNs are equivalent to a subclass of probabilistic finite-state automata.
These results present a first step towards characterizing the classes of distributions RNN LMs can represent.
arXiv Detail & Related papers (2023-10-08T13:36:05Z) - Advancing Regular Language Reasoning in Linear Recurrent Neural Networks [56.11830645258106]
We study whether linear recurrent neural networks (LRNNs) can learn the hidden rules in training sequences.
We propose a new LRNN equipped with a block-diagonal and input-dependent transition matrix.
Experiments suggest that the proposed model is the only LRNN capable of performing length extrapolation on regular language tasks.
arXiv Detail & Related papers (2023-09-14T03:36:01Z) - Learning Transductions and Alignments with RNN Seq2seq Models [0.8158530638728501]
We study the capabilities of Recurrent-Neural-Network sequence to sequence (RNN seq2seq) models in learning four transduction tasks.
We find that RNN seq2seq models are only able to approximate a mapping that fits the training or in-distribution data, instead of learning the underlying functions.
arXiv Detail & Related papers (2023-03-13T04:15:33Z) - Learning Hierarchical Structures with Differentiable Nondeterministic
Stacks [25.064819128982556]
We present a stack RNN model based on the recently proposed Nondeterministic Stack RNN (NS-RNN)
We show that the NS-RNN achieves lower cross-entropy than all previous stack RNNs on five context-free language modeling tasks.
We also propose a restricted version of the NS-RNN that makes it practical to use for language modeling on natural language.
arXiv Detail & Related papers (2021-09-05T03:25:23Z) - A Formal Hierarchy of RNN Architectures [88.38859874233944]
hierarchy is based on two formal properties: space, which measures the RNN's memory, and rational recurrence, defined as whether the recurrent update can be described by a weighted finite-state machine.
We show how these models' expressive capacity is expanded by stacking multiple layers or composing them with different pooling functions.
We hypothesize that the practical learnable capacity of unsaturated RNNs obeys a similar hierarchy.
arXiv Detail & Related papers (2020-04-18T00:57:54Z) - Recognizing Long Grammatical Sequences Using Recurrent Networks
Augmented With An External Differentiable Stack [73.48927855855219]
Recurrent neural networks (RNNs) are a widely used deep architecture for sequence modeling, generation, and prediction.
RNNs generalize poorly over very long sequences, which limits their applicability to many important temporal processing and time series forecasting problems.
One way to address these shortcomings is to couple an RNN with an external, differentiable memory structure, such as a stack.
In this paper, we improve the memory-augmented RNN with important architectural and state updating mechanisms.
arXiv Detail & Related papers (2020-04-04T14:19:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.