Recurrent Neural Language Models as Probabilistic Finite-state Automata
- URL: http://arxiv.org/abs/2310.05161v4
- Date: Tue, 19 Dec 2023 10:13:33 GMT
- Title: Recurrent Neural Language Models as Probabilistic Finite-state Automata
- Authors: Anej Svete, Ryan Cotterell
- Abstract summary: We study what classes of probability distributions RNN LMs can represent.
We show that simple RNNs are equivalent to a subclass of probabilistic finite-state automata.
These results present a first step towards characterizing the classes of distributions RNN LMs can represent.
- Score: 66.23172872811594
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Studying language models (LMs) in terms of well-understood formalisms allows
us to precisely characterize their abilities and limitations. Previous work has
investigated the representational capacity of recurrent neural network (RNN)
LMs in terms of their capacity to recognize unweighted formal languages.
However, LMs do not describe unweighted formal languages -- rather, they define
\emph{probability distributions} over strings. In this work, we study what
classes of such probability distributions RNN LMs can represent, which allows
us to make more direct statements about their capabilities. We show that simple
RNNs are equivalent to a subclass of probabilistic finite-state automata, and
can thus model a strict subset of probability distributions expressible by
finite-state models. Furthermore, we study the space complexity of representing
finite-state LMs with RNNs. We show that, to represent an arbitrary
deterministic finite-state LM with $N$ states over an alphabet $\alphabet$, an
RNN requires $\Omega\left(N |\Sigma|\right)$ neurons. These results present a
first step towards characterizing the classes of distributions RNN LMs can
represent and thus help us understand their capabilities and limitations.
Related papers
- On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning [87.73401758641089]
Chain-of-thought (CoT) reasoning has improved the performance of modern language models (LMs)
We show that LMs can represent the same family of distributions over strings as probabilistic Turing machines.
arXiv Detail & Related papers (2024-06-20T10:59:02Z) - What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages [78.1866280652834]
Large language models (LM) are distributions over strings.
We investigate the learnability of regular LMs (RLMs) by RNN and Transformer LMs.
We find that the complexity of the RLM rank is strong and significant predictors of learnability for both RNNs and Transformers.
arXiv Detail & Related papers (2024-06-06T17:34:24Z) - Lower Bounds on the Expressivity of Recurrent Neural Language Models [47.537525313391164]
Investigation into the representational capacity of neural LMs has predominantly focused on their ability to emphrecognize formal languages.
We take a fresh look at the representational capacity of RNN LMs by connecting them to emphprobabilistic FSAs and demonstrate that RNN LMs with linearly bounded precision can express arbitrary regular LMs.
arXiv Detail & Related papers (2024-05-29T16:02:09Z) - On Efficiently Representing Regular Languages as RNNs [49.88310438099143]
We show that RNNs can efficiently represent bounded hierarchical structures that are prevalent in human language.
This suggests that RNNs' success might be linked to their ability to model hierarchy.
We show that RNNs can efficiently represent a larger class of LMs than previously claimed.
arXiv Detail & Related papers (2024-02-24T13:42:06Z) - On the Representational Capacity of Recurrent Neural Language Models [56.19166912044362]
We show that a rationally weighted RLM with computation time can simulate any deterministic probabilistic Turing machine (PTM) with rationally weighted transitions.
We also provide a lower bound by showing that under the restriction to real-time computation, such models can simulate deterministic real-time rational PTMs.
arXiv Detail & Related papers (2023-10-19T17:39:47Z) - Learning Hierarchical Structures with Differentiable Nondeterministic
Stacks [25.064819128982556]
We present a stack RNN model based on the recently proposed Nondeterministic Stack RNN (NS-RNN)
We show that the NS-RNN achieves lower cross-entropy than all previous stack RNNs on five context-free language modeling tasks.
We also propose a restricted version of the NS-RNN that makes it practical to use for language modeling on natural language.
arXiv Detail & Related papers (2021-09-05T03:25:23Z) - Distance and Equivalence between Finite State Machines and Recurrent
Neural Networks: Computational results [0.348097307252416]
We show some results related to the problem of extracting Finite State Machine based models from trained RNN Language models.
Our reduction technique from 3-SAT makes this latter fact easily generalizable to other RNN architectures.
arXiv Detail & Related papers (2020-04-01T14:48:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.