Advancing Regular Language Reasoning in Linear Recurrent Neural Networks
- URL: http://arxiv.org/abs/2309.07412v2
- Date: Tue, 9 Apr 2024 05:12:30 GMT
- Title: Advancing Regular Language Reasoning in Linear Recurrent Neural Networks
- Authors: Ting-Han Fan, Ta-Chung Chi, Alexander I. Rudnicky,
- Abstract summary: We study whether linear recurrent neural networks (LRNNs) can learn the hidden rules in training sequences.
We propose a new LRNN equipped with a block-diagonal and input-dependent transition matrix.
Experiments suggest that the proposed model is the only LRNN capable of performing length extrapolation on regular language tasks.
- Score: 56.11830645258106
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent studies, linear recurrent neural networks (LRNNs) have achieved Transformer-level performance in natural language and long-range modeling, while offering rapid parallel training and constant inference cost. With the resurgence of interest in LRNNs, we study whether they can learn the hidden rules in training sequences, such as the grammatical structures of regular language. We theoretically analyze some existing LRNNs and discover their limitations in modeling regular language. Motivated by this analysis, we propose a new LRNN equipped with a block-diagonal and input-dependent transition matrix. Experiments suggest that the proposed model is the only LRNN capable of performing length extrapolation on regular language tasks such as Sum, Even Pair, and Modular Arithmetic. The code is released at \url{https://github.com/tinghanf/RegluarLRNN}.
Related papers
- Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues [65.41946981594567]
Linear Recurrent Neural Networks (LRNNs) have emerged as efficient alternatives to Transformers in large language modeling.
LRNNs struggle to perform state-tracking which may impair performance in tasks such as code evaluation or tracking a chess game.
Our work enhances the expressivity of modern LRNNs, broadening their applicability without changing the cost of training or inference.
arXiv Detail & Related papers (2024-11-19T14:35:38Z) - Training Neural Networks as Recognizers of Formal Languages [87.06906286950438]
Formal language theory pertains specifically to recognizers.
It is common to instead use proxy tasks that are similar in only an informal sense.
We correct this mismatch by training and evaluating neural networks directly as binary classifiers of strings.
arXiv Detail & Related papers (2024-11-11T16:33:25Z) - What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages [78.1866280652834]
Large language models (LM) are distributions over strings.
We investigate the learnability of regular LMs (RLMs) by RNN and Transformer LMs.
We find that the complexity of the RLM rank is strong and significant predictors of learnability for both RNNs and Transformers.
arXiv Detail & Related papers (2024-06-06T17:34:24Z) - Lower Bounds on the Expressivity of Recurrent Neural Language Models [47.537525313391164]
Investigation into the representational capacity of neural LMs has predominantly focused on their ability to emphrecognize formal languages.
We take a fresh look at the representational capacity of RNN LMs by connecting them to emphprobabilistic FSAs and demonstrate that RNN LMs with linearly bounded precision can express arbitrary regular LMs.
arXiv Detail & Related papers (2024-05-29T16:02:09Z) - Language Modeling Using Tensor Trains [11.19279979601076]
We propose a novel tensor network language model based on the simplest tensor network (i.e., tensor trains), called Tensor Train Language Model' (TTLM)
TTLM represents sentences in an exponential space constructed by the tensor product of words, but computing the probabilities of sentences in a low-dimensional fashion.
arXiv Detail & Related papers (2024-05-07T18:09:47Z) - In-Context Language Learning: Architectures and Algorithms [73.93205821154605]
We study ICL through the lens of a new family of model problems we term in context language learning (ICLL)
We evaluate a diverse set of neural sequence models on regular ICLL tasks.
arXiv Detail & Related papers (2024-01-23T18:59:21Z) - On the Computational Complexity and Formal Hierarchy of Second Order
Recurrent Neural Networks [59.85314067235965]
We extend the theoretical foundation for the $2nd$-order recurrent network ($2nd$ RNN)
We prove there exists a class of a $2nd$ RNN that is Turing-complete with bounded time.
We also demonstrate that $2$nd order RNNs, without memory, outperform modern-day models such as vanilla RNNs and gated recurrent units in recognizing regular grammars.
arXiv Detail & Related papers (2023-09-26T06:06:47Z) - Learning Hierarchical Structures with Differentiable Nondeterministic
Stacks [25.064819128982556]
We present a stack RNN model based on the recently proposed Nondeterministic Stack RNN (NS-RNN)
We show that the NS-RNN achieves lower cross-entropy than all previous stack RNNs on five context-free language modeling tasks.
We also propose a restricted version of the NS-RNN that makes it practical to use for language modeling on natural language.
arXiv Detail & Related papers (2021-09-05T03:25:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.