Average-Hard Attention Transformers are Constant-Depth Uniform Threshold
Circuits
- URL: http://arxiv.org/abs/2308.03212v2
- Date: Mon, 21 Aug 2023 18:54:56 GMT
- Title: Average-Hard Attention Transformers are Constant-Depth Uniform Threshold
Circuits
- Authors: Lena Strobl
- Abstract summary: We show that average-hard attention transformers recognize languages that fall within the class TC0.
Our paper shows that the first result can be extended to yield uniform circuits as well.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformers have emerged as a widely used neural network model for various
natural language processing tasks. Previous research explored their
relationship with constant-depth threshold circuits, making two assumptions:
average-hard attention and logarithmic precision for internal computations
relative to input length. Merrill et al. (2022) prove that average-hard
attention transformers recognize languages that fall within the complexity
class TC0, denoting the set of languages that can be recognized by
constant-depth polynomial-size threshold circuits. Likewise, Merrill and
Sabharwal (2023) show that log-precision transformers recognize languages
within the class of uniform TC0. This shows that both transformer models can be
simulated by constant-depth threshold circuits, with the latter being more
robust due to generating a uniform circuit family. Our paper shows that the
first result can be extended to yield uniform circuits as well.
Related papers
- Transpiling quantum circuits by a transformers-based algorithm [0.5249805590164902]
We develop a transformer model capable of transpiling quantum circuits from the qasm standard to other sets of gates native suited for a specific target quantum hardware.<n>The feasibility of a translation up to five qubits is demonstrated with a percentage of correctly transpiled target circuits equal or superior to 99.98%.
arXiv Detail & Related papers (2025-12-10T17:13:34Z) - Extracting Finite State Machines from Transformers [0.3069335774032178]
We investigate the trainability of transformers trained on regular languages from a mechanistic interpretability perspective.
We empirically find tighter lower bounds on the trainability of transformers, when a finite number of symbols determine the state.
Our mechanistic insight allows us to characterise the regular languages a one-layer transformer can learn with good length generalisation.
arXiv Detail & Related papers (2024-10-08T13:43:50Z) - Transformers are Efficient Compilers, Provably [11.459397066286822]
Transformer-based large language models (LLMs) have demonstrated surprisingly robust performance across a wide range of language-related tasks.
In this paper, we take the first steps towards a formal investigation of using transformers as compilers from an expressive power perspective.
We introduce a representative programming language, Mini-Husky, which encapsulates key features of modern C-like languages.
arXiv Detail & Related papers (2024-10-07T20:31:13Z) - Algorithmic Capabilities of Random Transformers [49.73113518329544]
We investigate what functions can be learned by randomly transformers in which only the embedding layers are optimized.
We find that these random transformers can perform a wide range of meaningful algorithmic tasks.
Our results indicate that some algorithmic capabilities are present in transformers even before these models are trained.
arXiv Detail & Related papers (2024-10-06T06:04:23Z) - On the Constant Depth Implementation of Pauli Exponentials [49.48516314472825]
We decompose $Zotimes n$ exponentials of arbitrary length into circuits of constant depth using $mathcalO(n)$ ancillae and two-body XX and ZZ interactions.<n>We prove the correctness of our approach, after introducing novel rewrite rules for circuits which benefit from qubit recycling.
arXiv Detail & Related papers (2024-08-15T17:09:08Z) - Circuit Transformer: End-to-end Circuit Design by Predicting the Next Gate [20.8279111910994]
Language, a prominent human ability to express through sequential symbols, has been computationally mastered by recent advances of large language models (LLMs)
LLMs have shown unprecedented capabilities in understanding and reasoning.
Can circuits also be mastered by a a sufficiently large "circuit model", which can conquer electronic design tasks by simply predicting the next logic gate?
arXiv Detail & Related papers (2024-03-14T03:24:14Z) - Transformers, parallel computation, and logarithmic depth [33.659870765923884]
We show that a constant number of self-attention layers can efficiently simulate, and be simulated by, a constant number of communication rounds of Massively Parallel Computation.
arXiv Detail & Related papers (2024-02-14T15:54:55Z) - Investigating Recurrent Transformers with Dynamic Halt [64.862738244735]
We study the inductive biases of two major approaches to augmenting Transformers with a recurrent mechanism.
We propose and investigate novel ways to extend and combine the methods.
arXiv Detail & Related papers (2024-02-01T19:47:31Z) - Ring Attention with Blockwise Transformers for Near-Infinite Context [88.61687950039662]
We present a novel approach, Ring Attention with Blockwise Transformers (Ring Attention), which leverages blockwise computation of self-attention and feedforward to distribute long sequences across multiple devices.
Our approach enables training and inference of sequences that are up to device count times longer than those achievable by prior memory-efficient Transformers.
arXiv Detail & Related papers (2023-10-03T08:44:50Z) - Transformers as Statisticians: Provable In-Context Learning with
In-Context Algorithm Selection [88.23337313766353]
This work first provides a comprehensive statistical theory for transformers to perform ICL.
We show that transformers can implement a broad class of standard machine learning algorithms in context.
A emphsingle transformer can adaptively select different base ICL algorithms.
arXiv Detail & Related papers (2023-06-07T17:59:31Z) - The Parallelism Tradeoff: Limitations of Log-Precision Transformers [29.716269397142973]
We prove that transformers whose arithmetic precision is logarithmic in the number of input tokens can be simulated by constant-depth logspace-uniform threshold circuits.
This provides insight on the power of transformers using known results in complexity theory.
arXiv Detail & Related papers (2022-07-02T03:49:34Z) - Error Correction Code Transformer [92.10654749898927]
We propose to extend for the first time the Transformer architecture to the soft decoding of linear codes at arbitrary block lengths.
We encode each channel's output dimension to high dimension for better representation of the bits information to be processed separately.
The proposed approach demonstrates the extreme power and flexibility of Transformers and outperforms existing state-of-the-art neural decoders by large margins at a fraction of their time complexity.
arXiv Detail & Related papers (2022-03-27T15:25:58Z) - Iterative Decoding for Compositional Generalization in Transformers [5.269770493488338]
In sequence-to-sequence learning, transformers are often unable to predict correct outputs for even marginally longer examples.
This paper introduces iterative decoding, an alternative to seq2seq learning.
We show that transfomers trained via iterative decoding outperform their seq2seq counterparts on the PCFG dataset.
arXiv Detail & Related papers (2021-10-08T14:52:25Z) - On the Power of Saturated Transformers: A View from Circuit Complexity [87.20342701232869]
We show that saturated transformers transcend the limitations of hard-attention transformers.
The jump from hard to saturated attention can be understood as increasing the transformer's effective circuit depth by a factor of $O(log n)$.
arXiv Detail & Related papers (2021-06-30T17:09:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.