Lower Bounds on the Expressivity of Recurrent Neural Language Models
- URL: http://arxiv.org/abs/2405.19222v2
- Date: Tue, 18 Jun 2024 15:42:32 GMT
- Title: Lower Bounds on the Expressivity of Recurrent Neural Language Models
- Authors: Anej Svete, Franz Nowak, Anisha Mohamed Sahabdeen, Ryan Cotterell,
- Abstract summary: Investigation into the representational capacity of neural LMs has predominantly focused on their ability to emphrecognize formal languages.
We take a fresh look at the representational capacity of RNN LMs by connecting them to emphprobabilistic FSAs and demonstrate that RNN LMs with linearly bounded precision can express arbitrary regular LMs.
- Score: 47.537525313391164
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recent successes and spread of large neural language models (LMs) call for a thorough understanding of their computational ability. Describing their computational abilities through LMs' \emph{representational capacity} is a lively area of research. However, investigation into the representational capacity of neural LMs has predominantly focused on their ability to \emph{recognize} formal languages. For example, recurrent neural networks (RNNs) with Heaviside activations are tightly linked to regular languages, i.e., languages defined by finite-state automata (FSAs). Such results, however, fall short of describing the capabilities of RNN \emph{language models} (LMs), which are definitionally \emph{distributions} over strings. We take a fresh look at the representational capacity of RNN LMs by connecting them to \emph{probabilistic} FSAs and demonstrate that RNN LMs with linearly bounded precision can express arbitrary regular LMs.
Related papers
- Training Neural Networks as Recognizers of Formal Languages [87.06906286950438]
Formal language theory pertains specifically to recognizers.
It is common to instead use proxy tasks that are similar in only an informal sense.
We correct this mismatch by training and evaluating neural networks directly as binary classifiers of strings.
arXiv Detail & Related papers (2024-11-11T16:33:25Z) - What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages [78.1866280652834]
Large language models (LM) are distributions over strings.
We investigate the learnability of regular LMs (RLMs) by RNN and Transformer LMs.
We find that the complexity of the RLM rank is strong and significant predictors of learnability for both RNNs and Transformers.
arXiv Detail & Related papers (2024-06-06T17:34:24Z) - Language Modeling Using Tensor Trains [11.19279979601076]
We propose a novel tensor network language model based on the simplest tensor network (i.e., tensor trains), called Tensor Train Language Model' (TTLM)
TTLM represents sentences in an exponential space constructed by the tensor product of words, but computing the probabilities of sentences in a low-dimensional fashion.
arXiv Detail & Related papers (2024-05-07T18:09:47Z) - On the Representational Capacity of Recurrent Neural Language Models [56.19166912044362]
We show that a rationally weighted RLM with computation time can simulate any deterministic probabilistic Turing machine (PTM) with rationally weighted transitions.
We also provide a lower bound by showing that under the restriction to real-time computation, such models can simulate deterministic real-time rational PTMs.
arXiv Detail & Related papers (2023-10-19T17:39:47Z) - Recurrent Neural Language Models as Probabilistic Finite-state Automata [66.23172872811594]
We study what classes of probability distributions RNN LMs can represent.
We show that simple RNNs are equivalent to a subclass of probabilistic finite-state automata.
These results present a first step towards characterizing the classes of distributions RNN LMs can represent.
arXiv Detail & Related papers (2023-10-08T13:36:05Z) - Advancing Regular Language Reasoning in Linear Recurrent Neural Networks [56.11830645258106]
We study whether linear recurrent neural networks (LRNNs) can learn the hidden rules in training sequences.
We propose a new LRNN equipped with a block-diagonal and input-dependent transition matrix.
Experiments suggest that the proposed model is the only LRNN capable of performing length extrapolation on regular language tasks.
arXiv Detail & Related papers (2023-09-14T03:36:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.