Logical Languages Accepted by Transformer Encoders with Hard Attention
- URL: http://arxiv.org/abs/2310.03817v1
- Date: Thu, 5 Oct 2023 18:13:40 GMT
- Title: Logical Languages Accepted by Transformer Encoders with Hard Attention
- Authors: Pablo Barcelo, Alexander Kozachinskiy, Anthony Widjaja Lin, Vladimir
Podolskii
- Abstract summary: We focus on two self-attention mechanisms: (1) UHAT (Unique Hard Attention Transformers) and (2) AHAT (Average Hard Attention Transformers)
UHAT encoders are known to recognize only languages inside the circuit complexity class $sf AC0$, i.e., accepted by a family of poly-sized and depth-bounded circuits with unbounded fan-ins.
AHAT encoders can recognize languages outside $sf AC0$, but their expressive power still lies within the bigger circuit complexity class $sf TC0$, i.e., $sf AC0$
- Score: 45.14832807541816
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We contribute to the study of formal languages that can be recognized by
transformer encoders. We focus on two self-attention mechanisms: (1) UHAT
(Unique Hard Attention Transformers) and (2) AHAT (Average Hard Attention
Transformers). UHAT encoders are known to recognize only languages inside the
circuit complexity class ${\sf AC}^0$, i.e., accepted by a family of poly-sized
and depth-bounded boolean circuits with unbounded fan-ins. On the other hand,
AHAT encoders can recognize languages outside ${\sf AC}^0$), but their
expressive power still lies within the bigger circuit complexity class ${\sf
TC}^0$, i.e., ${\sf AC}^0$-circuits extended by majority gates. We first show a
negative result that there is an ${\sf AC}^0$-language that cannot be
recognized by an UHAT encoder. On the positive side, we show that UHAT encoders
can recognize a rich fragment of ${\sf AC}^0$-languages, namely, all languages
definable in first-order logic with arbitrary unary numerical predicates. This
logic, includes, for example, all regular languages from ${\sf AC}^0$. We then
show that AHAT encoders can recognize all languages of our logic even when we
enrich it with counting terms. We apply these results to derive new results on
the expressive power of UHAT and AHAT up to permutation of letters (a.k.a.
Parikh images).
Related papers
- Context-Free Recognition with Transformers [57.46376097734401]
We show that looped transformers with $mathcalO(log n)$ looping layers and $mathcalO(n6)$ padding tokens can recognize all CFLs.<n>We empirically validate our results and show that looping helps on a language that provably requires logarithmic depth.
arXiv Detail & Related papers (2026-01-05T03:14:23Z) - Softmax Transformers are Turing-Complete [4.231989115090749]
We prove length-generalizable softmax CoT transformers are Turing-complete.<n>We show this is not Turing-complete for arbitrary languages.<n>We empirically validate our theory by training transformers for languages requiring complex arithmetic reasoning.
arXiv Detail & Related papers (2025-11-25T08:08:39Z) - Can Transformers Reason Logically? A Study in SAT Solving [17.15701291424892]
We study the logical reasoning capabilities of LLMs in the context of the Boolean satisfiability (SAT) problem.
First, we construct a decoder-only Transformer that can solve SAT using backtracking and deduction via Chain-of-Thought (CoT)
We prove its correctness by showing trace equivalence to the well-known DPLL SAT-solving algorithm.
arXiv Detail & Related papers (2024-10-09T21:01:52Z) - Extracting Finite State Machines from Transformers [0.3069335774032178]
We investigate the trainability of transformers trained on regular languages from a mechanistic interpretability perspective.
We empirically find tighter lower bounds on the trainability of transformers, when a finite number of symbols determine the state.
Our mechanistic insight allows us to characterise the regular languages a one-layer transformer can learn with good length generalisation.
arXiv Detail & Related papers (2024-10-08T13:43:50Z) - Can Transformers Learn $n$-gram Language Models? [77.35809823602307]
We study transformers' ability to learn random $n$-gram LMs of two kinds.
We find that classic estimation techniques for $n$-gram LMs such as add-$lambda$ smoothing outperform transformers.
arXiv Detail & Related papers (2024-10-03T21:21:02Z) - CRUXEval-X: A Benchmark for Multilingual Code Reasoning, Understanding and Execution [50.7413285637879]
The CRUXEVAL-X code reasoning benchmark contains 19 programming languages.
It comprises at least 600 subjects for each language, along with 19K content-consistent tests in total.
Even a model trained solely on Python can achieve at most 34.4% Pass@1 in other languages.
arXiv Detail & Related papers (2024-08-23T11:43:00Z) - Chain of Code: Reasoning with a Language Model-Augmented Code Emulator [115.16975276693267]
We propose Chain of Code, a simple yet surprisingly effective extension that improves LM code-driven reasoning.
The key idea is to encourage LMs to format semantic sub-tasks in a program as flexible pseudocode that the interpreter can explicitly catch.
arXiv Detail & Related papers (2023-12-07T17:51:43Z) - Guess & Sketch: Language Model Guided Transpilation [59.02147255276078]
Learned transpilation offers an alternative to manual re-writing and engineering efforts.
Probabilistic neural language models (LMs) produce plausible outputs for every input, but do so at the cost of guaranteed correctness.
Guess & Sketch extracts alignment and confidence information from features of the LM then passes it to a symbolic solver to resolve semantic equivalence.
arXiv Detail & Related papers (2023-09-25T15:42:18Z) - Verified Reversible Programming for Verified Lossless Compression [11.020543186794459]
Lossless compression implementations typically contain two programs, an encoder and a decoder, which are required to be inverse to one another.
We observe that a significant class of compression methods, based on asymmetric numeral systems (ANS), have shared structure between the encoder and decoder.
We have implemented a small reversible language, embedded in Agda, which we call 'Flipper'
arXiv Detail & Related papers (2022-11-02T16:39:41Z) - Formal Language Recognition by Hard Attention Transformers: Perspectives
from Circuit Complexity [1.0159205678719043]
We show that UHAT and GUHAT Transformers, viewed as string acceptors, can only recognize formal languages in the complexity class AC$0$.
In contrast, the non-AC$0$ languages MAJORITY and DYCK-1 are recognizable by AHAT networks, implying that AHAT can recognize languages that UHAT and GUHAT cannot.
arXiv Detail & Related papers (2022-04-13T19:25:42Z) - Safety Synthesis Sans Specification [5.874782446136914]
We argue that this is a natural question in many situations in hardware and software verification.
We devise a learning algorithm for this problem and show that its time and query complexity is with respect to the rank of the target language, its incompatibility measure, and the maximal length of a given counterexample.
arXiv Detail & Related papers (2020-11-15T21:13:17Z) - RNNs can generate bounded hierarchical languages with optimal memory [113.73133308478612]
We show that RNNs can efficiently generate bounded hierarchical languages that reflect the scaffolding of natural language syntax.
We introduce Dyck-($k$,$m$), the language of well-nested brackets (of $k$ types) and $m$-bounded nesting depth.
We prove that an RNN with $O(m log k)$ hidden units suffices, an exponential reduction in memory, by an explicit construction.
arXiv Detail & Related papers (2020-10-15T04:42:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.