On Limitations of the Transformer Architecture
- URL: http://arxiv.org/abs/2402.08164v2
- Date: Mon, 26 Feb 2024 22:12:37 GMT
- Title: On Limitations of the Transformer Architecture
- Authors: Binghui Peng, Srini Narayanan, Christos Papadimitriou
- Abstract summary: We show that the Transformer layer is incapable of composing functions if the domains of the functions are large enough.
We also point out that several mathematical tasks that are at the core of the so-called compositional tasks thought to be hard for LLMs are unlikely to be solvable by Transformers.
- Score: 15.329285967441372
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: What are the root causes of hallucinations in large language models (LLMs)?
We use Communication Complexity to prove that the Transformer layer is
incapable of composing functions (e.g., identify a grandparent of a person in a
genealogy) if the domains of the functions are large enough; we show through
examples that this inability is already empirically present when the domains
are quite small. We also point out that several mathematical tasks that are at
the core of the so-called compositional tasks thought to be hard for LLMs are
unlikely to be solvable by Transformers, for large enough instances and
assuming that certain well accepted conjectures in the field of Computational
Complexity are true.
Related papers
- Ask, and it shall be given: Turing completeness of prompting [47.08833920586575]
Large language models (LLMs) have been revolutionizing machine learning and initiated the so-called LLM prompting paradigm.
Here, we present the first theoretical study on the LLM prompting paradigm to the best of our knowledge.
We show that prompting is Turing-complete: there exists a finite-size Transformer such that for any computable function, there exists a corresponding prompt following which the Transformer computes the function.
arXiv Detail & Related papers (2024-11-04T11:26:38Z) - When Can Transformers Count to n? [48.32323039293186]
We show that if the dimension of the transformer state is linear in the context length, this task can be solved.
We provide theoretical arguments for why it is likely impossible for a size limited transformer to implement this task.
Our results demonstrate the importance of understanding how transformers can solve simple tasks.
arXiv Detail & Related papers (2024-07-21T13:31:02Z) - Faith and Fate: Limits of Transformers on Compositionality [109.79516190693415]
We investigate the limits of transformer large language models across three representative compositional tasks.
These tasks require breaking problems down into sub-steps and synthesizing these steps into a precise answer.
Our empirical findings suggest that transformer LLMs solve compositional tasks by reducing multi-step compositional reasoning into linearized subgraph matching.
arXiv Detail & Related papers (2023-05-29T23:24:14Z) - Towards Revealing the Mystery behind Chain of Thought: A Theoretical
Perspective [39.47116013338394]
Chain-of-Thought prompting (CoT) can dramatically improve the performance of Large Language Models (LLMs)
We show that CoT can handle a general class of decision-making problems known as Dynamic Programming.
arXiv Detail & Related papers (2023-05-24T17:59:21Z) - How Do Transformers Learn Topic Structure: Towards a Mechanistic
Understanding [56.222097640468306]
We provide mechanistic understanding of how transformers learn "semantic structure"
We show, through a combination of mathematical analysis and experiments on Wikipedia data, that the embedding layer and the self-attention layer encode the topical structure.
arXiv Detail & Related papers (2023-03-07T21:42:17Z) - Statistically Meaningful Approximation: a Case Study on Approximating
Turing Machines with Transformers [50.85524803885483]
This work proposes a formal definition of statistically meaningful (SM) approximation which requires the approximating network to exhibit good statistical learnability.
We study SM approximation for two function classes: circuits and Turing machines.
arXiv Detail & Related papers (2021-07-28T04:28:55Z) - Quantifying and Improving Transferability in Domain Generalization [53.16289325326505]
Out-of-distribution generalization is one of the key challenges when transferring a model from the lab to the real world.
We formally define transferability that one can quantify and compute in domain generalization.
We propose a new algorithm for learning transferable features and test it over various benchmark datasets.
arXiv Detail & Related papers (2021-06-07T14:04:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.