Related papers: Simulating Weighted Automata over Sequences and Trees with Transformers

Simulating Weighted Automata over Sequences and Trees with Transformers

URL: http://arxiv.org/abs/2403.09728v1
Date: Tue, 12 Mar 2024 21:54:34 GMT
Title: Simulating Weighted Automata over Sequences and Trees with Transformers
Authors: Michael Rizvi, Maude Lizaire, Clara Lacroce, Guillaume Rabusseau,
Abstract summary: We show that transformers can simulate weighted finite automata (WFAs), a class of models which subsumes DFAs, as well as weighted tree automata (WTA) We prove these claims formally and provide upper bounds on the sizes of the transformer models needed as a function of the number of states the target automata.
Score: 5.078561931628571
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transformers are ubiquitous models in the natural language processing (NLP) community and have shown impressive empirical successes in the past few years. However, little is understood about how they reason and the limits of their computational capabilities. These models do not process data sequentially, and yet outperform sequential neural models such as RNNs. Recent work has shown that these models can compactly simulate the sequential reasoning abilities of deterministic finite automata (DFAs). This leads to the following question: can transformers simulate the reasoning of more complex finite state machines? In this work, we show that transformers can simulate weighted finite automata (WFAs), a class of models which subsumes DFAs, as well as weighted tree automata (WTA), a generalization of weighted automata to tree structured inputs. We prove these claims formally and provide upper bounds on the sizes of the transformer models needed as a function of the number of states the target automata. Empirically, we perform synthetic experiments showing that transformers are able to learn these compact solutions via standard gradient-based training.

Related papers

Transformers in Pseudo-Random Number Generation: A Dual Perspective on Theory and Practice [1.8725832935669624]
Pseudo-random number generators (PRNGs) are high-nonlinear processes, and they are key blocks in optimization of Large language models.<n>We show that it is reasonable to generate high-quality pseudo-random numbers based on transformers.
arXiv Detail & Related papers (2025-08-02T01:31:53Z)
Constant Bit-size Transformers Are Turing Complete [8.38684825915246]
We prove that any Turing machine running on inputs of arbitrary length can be simulated by a constant bit-size transformer.<n>Our approach relies on simulating Post machines, a Turing-complete computational model.
arXiv Detail & Related papers (2025-05-22T02:45:38Z)
Transformers Simulate MLE for Sequence Generation in Bayesian Networks [18.869174453242383]
We investigate the theoretical capabilities of transformers to autoregressively generate sequences in Bayesian networks based on in-context maximum likelihood estimation (MLE) We demonstrate that there exists a simple transformer model that can estimate the conditional probabilities of the Bayesian network according to the context. We further demonstrate in extensive experiments that such a transformer does not only exist in theory, but can also be effectively obtained through training.
arXiv Detail & Related papers (2025-01-05T13:56:51Z)
Algorithmic Capabilities of Random Transformers [49.73113518329544]
We investigate what functions can be learned by randomly transformers in which only the embedding layers are optimized. We find that these random transformers can perform a wide range of meaningful algorithmic tasks. Our results indicate that some algorithmic capabilities are present in transformers even before these models are trained.
arXiv Detail & Related papers (2024-10-06T06:04:23Z)
On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning [87.73401758641089]
Chain-of-thought (CoT) reasoning has improved the performance of modern language models (LMs) We show that LMs can represent the same family of distributions over strings as probabilistic Turing machines.
arXiv Detail & Related papers (2024-06-20T10:59:02Z)
Boolformer: Symbolic Regression of Logic Functions with Transformers [26.946376237404994]
We introduce Boolformer, the first Transformer architecture trained to perform end-to-end symbolic regression of Boolean functions. We show that it can predict compact formulas for complex functions which were not seen during training, when provided a clean truth table. We evaluate the Boolformer on a broad set of real-world binary classification datasets, demonstrating its potential as an interpretable alternative to classic machine learning methods.
arXiv Detail & Related papers (2023-09-21T16:11:38Z)
How Powerful are Decoder-Only Transformer Neural Models? [0.0]
This is the first work to address the Turing completeness of the underlying technology employed in GPT-x. We show that the sparsity/compressibility of the word embedding is an important consideration for Turing completeness to hold.
arXiv Detail & Related papers (2023-05-26T15:35:43Z)
Characterizing Intrinsic Compositionality in Transformers with Tree Projections [72.45375959893218]
neural models like transformers can route information arbitrarily between different parts of their input. We show that transformers for three different tasks become more treelike over the course of training. These trees are predictive of model behavior, with more tree-like models generalizing better on tests of compositional generalization.
arXiv Detail & Related papers (2022-11-02T17:10:07Z)
Transformers Learn Shortcuts to Automata [52.015990420075944]
We find that a low-depth Transformer can represent the computations of any finite-state automaton. We show that a Transformer with $O(log T)$ layers can exactly replicate the computation of an automaton on an input sequence of length $T$. We further investigate the brittleness of these solutions and propose potential mitigations.
arXiv Detail & Related papers (2022-10-19T17:45:48Z)
The Parallelism Tradeoff: Limitations of Log-Precision Transformers [29.716269397142973]
We prove that transformers whose arithmetic precision is logarithmic in the number of input tokens can be simulated by constant-depth logspace-uniform threshold circuits. This provides insight on the power of transformers using known results in complexity theory.
arXiv Detail & Related papers (2022-07-02T03:49:34Z)
Automatic Rule Induction for Efficient Semi-Supervised Learning [56.91428251227253]
Semi-supervised learning has shown promise in allowing NLP models to generalize from small amounts of labeled data. Pretrained transformer models act as black-box correlation engines that are difficult to explain and sometimes behave unreliably. We propose tackling both of these challenges via Automatic Rule Induction (ARI), a simple and general-purpose framework.
arXiv Detail & Related papers (2022-05-18T16:50:20Z)
Addressing Some Limitations of Transformers with Feedback Memory [51.94640029417114]
Transformers have been successfully applied to sequential, auto-regressive tasks despite being feedforward networks. We propose the Feedback Transformer architecture that exposes all previous representations to all future representations. We demonstrate on a variety of benchmarks in language modeling, machine translation, and reinforcement learning that the increased representation capacity can create small, shallow models with much stronger performance than comparable Transformers.
arXiv Detail & Related papers (2020-02-21T16:37:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.