Barriers to Discrete Reasoning with Transformers: A Survey Across Depth, Exactness, and Bandwidth
- URL: http://arxiv.org/abs/2602.11175v1
- Date: Mon, 19 Jan 2026 21:53:11 GMT
- Title: Barriers to Discrete Reasoning with Transformers: A Survey Across Depth, Exactness, and Bandwidth
- Authors: Michelle Yuan, Weiyi Sun, Amir H. Rezaeian, Jyotika Singh, Sandip Ghoshal, Yao-Ting Wang, Miguel Ballesteros, Yassine Benajiba,
- Abstract summary: Transformers have become the foundational architecture for a broad spectrum of sequence modeling applications.<n>However, their theoretical limitations in discrete reasoning tasks, such as arithmetic, logical inference, and algorithmic composition, remain a critical open problem.
- Score: 8.621689945201346
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformers have become the foundational architecture for a broad spectrum of sequence modeling applications, underpinning state-of-the-art systems in natural language processing, vision, and beyond. However, their theoretical limitations in discrete reasoning tasks, such as arithmetic, logical inference, and algorithmic composition, remain a critical open problem. In this survey, we synthesize recent studies from three theoretical perspectives: circuit complexity, approximation theory, and communication complexity, to clarify the structural and computational barriers that transformers face when performing symbolic computations. By connecting these established theoretical frameworks, we provide an accessible and unified account of why current transformer architectures struggle to implement exact discrete algorithms, even as they excel at pattern matching and interpolation. We review key definitions, seminal results, and illustrative examples, highlighting challenges such as depth constraints, difficulty approximating discontinuities, and bottlenecks in inter-token communication. Finally, we discuss implications for model design and suggest promising directions for overcoming these foundational limitations.
Related papers
- Fourier Neural Operators Explained: A Practical Perspective [75.12291469255794]
The Fourier Neural Operator (FNO) has become the most influential and widely adopted due to its elegant spectral formulation.<n>This guide aims to establish a clear and reliable framework for applying FNOs effectively across diverse scientific and engineering fields.
arXiv Detail & Related papers (2025-12-01T08:56:21Z) - Transformers Provably Learn Chain-of-Thought Reasoning with Length Generalization [53.89723291716722]
A crucial question about AI reasoning is whether models can extrapolate learned reasoning patterns to solve harder tasks with longer chain-of-thought (CoT)<n>We mathematically prove how the algebraic structure of state-tracking problems governs the degree of extrapolation of the learned CoT.<n>We provide the first optimization guarantee that constant-depth transformers provably learn $mathsfNC1$-complete problems with CoT.
arXiv Detail & Related papers (2025-11-10T18:40:24Z) - A Mathematical Explanation of Transformers for Large Language Models and GPTs [6.245431127481903]
We propose a novel continuous framework that interprets the Transformer as a discretization of a structured integro-differential equation.<n>Within this formulation, the self-attention mechanism emerges naturally as a non-local integral operator.<n>Our approach extends beyond previous theoretical analyses by embedding the entire Transformer operation in continuous domains.
arXiv Detail & Related papers (2025-10-05T01:16:08Z) - Unlocking Symbol-Level Precoding Efficiency Through Tensor Equivariant Neural Network [84.22115118596741]
We propose an end-to-end deep learning (DL) framework with low inference complexity for symbol-level precoding.<n>We show that the proposed framework captures substantial performance gains of optimal SLP, while achieving an approximately 80-times speedup over conventional methods.
arXiv Detail & Related papers (2025-10-02T15:15:50Z) - On the Existence of Universal Simulators of Attention [17.01811978811789]
We present solutions to identically replicate attention outputs and the underlying elementary matrix and activation operations via RASP.<n>Our proofs, for the first time, show the existence of an algorithmically achievable data-agnostic solution, previously known to be approximated only by learning.
arXiv Detail & Related papers (2025-06-23T15:15:25Z) - Propositional Logic for Probing Generalization in Neural Networks [3.6037930269014633]
We investigate the generalization behavior of three key neural architectures (Transformers, Graph Convolution Networks and LSTMs) in a controlled task rooted in propositional logic.<n>We find thatTransformers fail to apply negation compositionally, unless structural biases are introduced.<n>Our findings highlight persistent limitations in the ability of standard architectures to learn systematic representations of logical operators.
arXiv Detail & Related papers (2025-06-10T16:46:05Z) - Complexity Control Facilitates Reasoning-Based Compositional Generalization in Transformers [10.206921909332006]
This study investigates the internal mechanisms underlying Transformers' behavior in compositional tasks.<n>We find that complexity control strategies influence whether the model learns primitive-level rules that generalize out-of-distribution (reasoning-based solutions) or relies solely on memorized mappings (memory-based solutions)
arXiv Detail & Related papers (2025-01-15T02:54:52Z) - The Computational Complexity of Circuit Discovery for Inner Interpretability [0.30723404270319693]
We study circuit discovery with classical and parameterized computational complexity theory.<n>Our findings reveal a challenging complexity landscape.<n>This framework allows us to understand the scope and limits of interpretability queries.
arXiv Detail & Related papers (2024-10-10T15:22:48Z) - Hierarchical Invariance for Robust and Interpretable Vision Tasks at Larger Scales [54.78115855552886]
We show how to construct over-complete invariants with a Convolutional Neural Networks (CNN)-like hierarchical architecture.
With the over-completeness, discriminative features w.r.t. the task can be adaptively formed in a Neural Architecture Search (NAS)-like manner.
For robust and interpretable vision tasks at larger scales, hierarchical invariant representation can be considered as an effective alternative to traditional CNN and invariants.
arXiv Detail & Related papers (2024-02-23T16:50:07Z) - Faith and Fate: Limits of Transformers on Compositionality [109.79516190693415]
We investigate the limits of transformer large language models across three representative compositional tasks.
These tasks require breaking problems down into sub-steps and synthesizing these steps into a precise answer.
Our empirical findings suggest that transformer LLMs solve compositional tasks by reducing multi-step compositional reasoning into linearized subgraph matching.
arXiv Detail & Related papers (2023-05-29T23:24:14Z) - Query Structure Modeling for Inductive Logical Reasoning Over Knowledge
Graphs [67.043747188954]
We propose a structure-modeled textual encoding framework for inductive logical reasoning over KGs.
It encodes linearized query structures and entities using pre-trained language models to find answers.
We conduct experiments on two inductive logical reasoning datasets and three transductive datasets.
arXiv Detail & Related papers (2023-05-23T01:25:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.