Context-Free Recognition with Transformers
- URL: http://arxiv.org/abs/2601.01754v1
- Date: Mon, 05 Jan 2026 03:14:23 GMT
- Title: Context-Free Recognition with Transformers
- Authors: Selim Jerad, Anej Svete, Sophie Hao, Ryan Cotterell, William Merrill,
- Abstract summary: We show that looped transformers with $mathcalO(log n)$ looping layers and $mathcalO(n6)$ padding tokens can recognize all CFLs.<n>We empirically validate our results and show that looping helps on a language that provably requires logarithmic depth.
- Score: 57.46376097734401
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformers excel on tasks that process well-formed inputs according to some grammar, such as natural language and code. However, it remains unclear how they can process grammatical syntax. In fact, under standard complexity conjectures, standard transformers cannot recognize context-free languages (CFLs), a canonical formalism to describe syntax, or even regular languages, a subclass of CFLs (Merrill et al., 2022). Merrill & Sabharwal (2024) show that $\mathcal{O}(\log n)$ looping layers (w.r.t. input length $n$) allows transformers to recognize regular languages, but the question of context-free recognition remained open. In this work, we show that looped transformers with $\mathcal{O}(\log n)$ looping layers and $\mathcal{O}(n^6)$ padding tokens can recognize all CFLs. However, training and inference with $\mathcal{O}(n^6)$ padding tokens is potentially impractical. Fortunately, we show that, for natural subclasses such as unambiguous CFLs, the recognition problem on transformers becomes more tractable, requiring $\mathcal{O}(n^3)$ padding. We empirically validate our results and show that looping helps on a language that provably requires logarithmic depth. Overall, our results shed light on the intricacy of CFL recognition by transformers: While general recognition may require an intractable amount of padding, natural constraints such as unambiguity yield efficient recognition algorithms.
Related papers
- Exact Expressive Power of Transformers with Padding [33.4994300801846]
We show that padded transformers with $O(logd n)$ looping on inputs of length recognize exactly the class $mathsfFO$uniform $mathsfTCd$ moderately parallelizable problems.<n>Our results motivate further exploration of padding and looping as parallelizable alternatives to chain of thought for test-time compute.
arXiv Detail & Related papers (2025-05-25T02:52:15Z) - Training Neural Networks as Recognizers of Formal Languages [87.06906286950438]
We train and evaluate neural networks directly as binary classifiers of strings.<n>We provide results on a variety of languages across the Chomsky hierarchy for three neural architectures.<n>Our contributions will facilitate theoretically sound empirical testing of language recognition claims in future work.
arXiv Detail & Related papers (2024-11-11T16:33:25Z) - Can Transformers Learn $n$-gram Language Models? [77.35809823602307]
We study transformers' ability to learn random $n$-gram LMs of two kinds.
We find that classic estimation techniques for $n$-gram LMs such as add-$lambda$ smoothing outperform transformers.
arXiv Detail & Related papers (2024-10-03T21:21:02Z) - Counting Like Transformers: Compiling Temporal Counting Logic Into Softmax Transformers [8.908747084128397]
We introduce the temporal counting logic $textsfK_textt$[#] alongside the RASP variant $textsfC-RASP$.<n>We show they are equivalent to each other, and that together they are the best-known lower bound on the formal expressivity of future-masked soft attention transformers with unbounded input size.
arXiv Detail & Related papers (2024-04-05T20:36:30Z) - Chain of Thought Empowers Transformers to Solve Inherently Serial Problems [57.58801785642868]
Chain of thought (CoT) is a highly effective method to improve the accuracy of large language models (LLMs) on arithmetics and symbolic reasoning tasks.
This work provides a theoretical understanding of the power of CoT for decoder-only transformers through the lens of expressiveness.
arXiv Detail & Related papers (2024-02-20T10:11:03Z) - Masked Hard-Attention Transformers Recognize Exactly the Star-Free Languages [7.938342455750221]
We study exact characterizations of transformers with hard attention and attention masking.
With strict masking (each position cannot attend to itself) and without position embeddings, these transformers are expressively equivalent to linear temporal logic.
arXiv Detail & Related papers (2023-10-21T03:26:39Z) - Logical Languages Accepted by Transformer Encoders with Hard Attention [45.14832807541816]
We focus on two self-attention mechanisms: (1) UHAT (Unique Hard Attention Transformers) and (2) AHAT (Average Hard Attention Transformers)
UHAT encoders are known to recognize only languages inside the circuit complexity class $sf AC0$, i.e., accepted by a family of poly-sized and depth-bounded circuits with unbounded fan-ins.
AHAT encoders can recognize languages outside $sf AC0$, but their expressive power still lies within the bigger circuit complexity class $sf TC0$, i.e., $sf AC0$
arXiv Detail & Related papers (2023-10-05T18:13:40Z) - Physics of Language Models: Part 1, Learning Hierarchical Language Structures [51.68385617116854]
Transformer-based language models are effective but complex, and understanding their inner workings and reasoning mechanisms is a significant challenge.<n>We introduce a family of synthetic CFGs that produce hierarchical rules, capable of generating lengthy sentences.<n>We demonstrate that generative models like GPT can accurately learn and reason over CFG-defined hierarchies and generate sentences based on it.
arXiv Detail & Related papers (2023-05-23T04:28:16Z) - Transformers Learn Shortcuts to Automata [52.015990420075944]
We find that a low-depth Transformer can represent the computations of any finite-state automaton.
We show that a Transformer with $O(log T)$ layers can exactly replicate the computation of an automaton on an input sequence of length $T$.
We further investigate the brittleness of these solutions and propose potential mitigations.
arXiv Detail & Related papers (2022-10-19T17:45:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.