Related papers: On Limitations of the Transformer Architecture

On Limitations of the Transformer Architecture

URL: http://arxiv.org/abs/2402.08164v2
Date: Mon, 26 Feb 2024 22:12:37 GMT
Title: On Limitations of the Transformer Architecture
Authors: Binghui Peng, Srini Narayanan, Christos Papadimitriou
Abstract summary: We show that the Transformer layer is incapable of composing functions if the domains of the functions are large enough. We also point out that several mathematical tasks that are at the core of the so-called compositional tasks thought to be hard for LLMs are unlikely to be solvable by Transformers.
Score: 15.329285967441372
License: http://creativecommons.org/licenses/by/4.0/
Abstract: What are the root causes of hallucinations in large language models (LLMs)? We use Communication Complexity to prove that the Transformer layer is incapable of composing functions (e.g., identify a grandparent of a person in a genealogy) if the domains of the functions are large enough; we show through examples that this inability is already empirically present when the domains are quite small. We also point out that several mathematical tasks that are at the core of the so-called compositional tasks thought to be hard for LLMs are unlikely to be solvable by Transformers, for large enough instances and assuming that certain well accepted conjectures in the field of Computational Complexity are true.

Related papers

Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers [5.4649464326326]
Chain-of-thought reasoning and scratchpads have emerged as critical tools for enhancing the computational capabilities of transformers. In this work, we initiate the study of systematic lower bounds for the number of CoT steps across different algorithmic problems.
arXiv Detail & Related papers (2025-02-04T15:14:01Z)
Ask, and it shall be given: Turing completeness of prompting [47.08833920586575]
Large language models (LLMs) have been revolutionizing machine learning and initiated the so-called LLM prompting paradigm. Here, we present the first theoretical study on the LLM prompting paradigm to the best of our knowledge. We show that prompting is Turing-complete: there exists a finite-size Transformer such that for any computable function, there exists a corresponding prompt following which the Transformer computes the function.
arXiv Detail & Related papers (2024-11-04T11:26:38Z)
MathGAP: Out-of-Distribution Evaluation on Problems with Arbitrarily Complex Proofs [80.96119560172224]
MathGAP generates problem statements and chain-of-thought reasoning traces according to specifications about their arithmetic proof structure. Using MathGAP, we find that LLMs show a significant decrease in performance as proofs get deeper and wider.
arXiv Detail & Related papers (2024-10-17T12:48:14Z)
When Can Transformers Count to n? [48.32323039293186]
We show that if the dimension of the transformer state is linear in the context length, this task can be solved. We provide theoretical arguments for why it is likely impossible for a size limited transformer to implement this task. Our results demonstrate the importance of understanding how transformers can solve simple tasks.
arXiv Detail & Related papers (2024-07-21T13:31:02Z)
Faith and Fate: Limits of Transformers on Compositionality [109.79516190693415]
We investigate the limits of transformer large language models across three representative compositional tasks. These tasks require breaking problems down into sub-steps and synthesizing these steps into a precise answer. Our empirical findings suggest that transformer LLMs solve compositional tasks by reducing multi-step compositional reasoning into linearized subgraph matching.
arXiv Detail & Related papers (2023-05-29T23:24:14Z)
Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective [39.47116013338394]
Chain-of-Thought prompting (CoT) can dramatically improve the performance of Large Language Models (LLMs) We show that CoT can handle a general class of decision-making problems known as Dynamic Programming.
arXiv Detail & Related papers (2023-05-24T17:59:21Z)
How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding [56.222097640468306]
We provide mechanistic understanding of how transformers learn "semantic structure" We show, through a combination of mathematical analysis and experiments on Wikipedia data, that the embedding layer and the self-attention layer encode the topical structure.
arXiv Detail & Related papers (2023-03-07T21:42:17Z)
Statistically Meaningful Approximation: a Case Study on Approximating Turing Machines with Transformers [50.85524803885483]
This work proposes a formal definition of statistically meaningful (SM) approximation which requires the approximating network to exhibit good statistical learnability. We study SM approximation for two function classes: circuits and Turing machines.
arXiv Detail & Related papers (2021-07-28T04:28:55Z)
Quantifying and Improving Transferability in Domain Generalization [53.16289325326505]
Out-of-distribution generalization is one of the key challenges when transferring a model from the lab to the real world. We formally define transferability that one can quantify and compute in domain generalization. We propose a new algorithm for learning transferable features and test it over various benchmark datasets.
arXiv Detail & Related papers (2021-06-07T14:04:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.