Related papers: Intrinsic Stability Limits of Autoregressive Reasoning: Structural Consequences for Long-Horizon Execution

Intrinsic Stability Limits of Autoregressive Reasoning: Structural Consequences for Long-Horizon Execution

URL: http://arxiv.org/abs/2602.06413v1
Date: Fri, 06 Feb 2026 06:11:06 GMT
Title: Intrinsic Stability Limits of Autoregressive Reasoning: Structural Consequences for Long-Horizon Execution
Authors: Hsien-Jyh Liao,
Abstract summary: Large language models (LLMs) demonstrate remarkable reasoning capabilities, yet their performance often deteriorates sharply in long-horizon tasks.<n>We propose that the fundamental constraint on long-horizon reasoning arises from process-level instability in autoregressive generation.<n>Our findings suggest new limitations on maintaining long-term coherence under purely autoregressive architectures.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) demonstrate remarkable reasoning capabilities, yet their performance often deteriorates sharply in long-horizon tasks, exhibiting systematic breakdown beyond certain scales. Conventional explanations primarily attribute this phenomenon to task complexity, such as combinatorial search explosion or long-term credit assignment challenges. In this work, we argue that these explanations are incomplete: even in linear, unbranched tasks without semantic ambiguity, autoregressive execution is subject to an intrinsic stability limit. We propose that the fundamental constraint on long-horizon reasoning arises from process-level instability in autoregressive generation rather than solely from search or task complexity, reframing long-horizon reasoning as a problem of structural governance. We derive Theorem~A, showing that decision advantage in single-path autoregressive reasoning decays exponentially with execution length, imposing a fundamental bound on maintainable reasoning chains. This result implies a structural consequence: stable long-horizon reasoning requires discrete segmentation, naturally inducing graph-like execution structures such as directed acyclic graphs (DAGs). Empirical studies in both synthetic environments and real TextWorld tasks reveal observable performance cliffs consistent with theoretical predictions. Our findings provide a dynamical perspective on long-horizon reasoning failure and suggest new limitations on maintaining long-term coherence under purely autoregressive architectures. Furthermore, we highlight that short-horizon evaluation protocols may obscure structural instability, indicating a potential shift from scaling toward structured governance in future reasoning systems.

Related papers

On Multi-Step Theorem Prediction via Non-Parametric Structural Priors [50.16583672681106]
In this work, we explore training-free theorem prediction through the lens of in-context learning (ICL)<n>We propose Theorem Precedence Graphs, which encode temporal dependencies from historical solution traces as directed graphs, and impose explicit topological constraints that effectively prune the search space during inference.<n>Experiments on the FormalGeo7k benchmark show that our method achieves 89.29% accuracy, substantially outperforming ICL baselines and matching state-of-the-art supervised models.
arXiv Detail & Related papers (2026-03-05T06:08:50Z)
Operationalizing Longitudinal Causal Discovery Under Real-World Workflow Constraints [2.593291716183273]
Causal discovery has achieved substantial theoretical progress, yet its deployment in longitudinal systems remains limited.<n>We describe a workflow-induced constraint class for longitudinal causal discovery that restricts the admissible directed acyclic graph space.<n>We show that explicitly encoding workflow-consistent partial orders reduces structural ambiguity.
arXiv Detail & Related papers (2026-02-27T08:40:17Z)
GHS-TDA: A Synergistic Reasoning Framework Integrating Global Hypothesis Space with Topological Data Analysis [27.271992201673083]
Chain-of-Thought (CoT) has been shown to significantly improve the reasoning accuracy of large language models (LLMs)<n>Existing CoT methods suffer from two fundamental limitations.
arXiv Detail & Related papers (2026-02-10T14:00:30Z)
Structured Reasoning for Large Language Models [59.215789462977206]
We propose Structured Reasoning (SCR), a framework that decouples reasoning trajectories into explicit, evaluable, and trainable components.<n>SCR substantially improves reasoning efficiency and self-verification.<n>Compared with existing reasoning paradigms, it reduces output token length by up to 50%.
arXiv Detail & Related papers (2026-01-12T04:04:01Z)
Constraint Breeds Generalization: Temporal Dynamics as an Inductive Bias [1.219017431258669]
We show that constraints shape dynamics to function not as limitations, but as a temporal inductive bias that breeds generalization.<n>We show that robust AI development requires not only scaling and removing limitations, but computationally mastering the temporal characteristics that naturally promote generalization.
arXiv Detail & Related papers (2025-12-30T00:34:24Z)
Understanding Chain-of-Thought in Large Language Models via Topological Data Analysis [28.69471462319666]
This work is the first to analyze and evaluate the quality of the reasoning chain from a structural perspective.<n>We map reasoning steps into semantic space, extract topological features, and analyze structural changes.<n>Our results show that the topological structural complexity of reasoning chains correlates positively with accuracy.
arXiv Detail & Related papers (2025-12-22T08:28:08Z)
NeSTR: A Neuro-Symbolic Abductive Framework for Temporal Reasoning in Large Language Models [12.935644609836507]
Neuro-Symbolic Temporal Reasoning (NeSTR) is a novel framework that integrates structured symbolic representations with hybrid reflective reasoning.<n>NeSTR preserves explicit temporal relations through symbolic encoding, enforces logical consistency via verification, and corrects flawed inferences using abductive reflection.
arXiv Detail & Related papers (2025-12-08T06:58:23Z)
A Self-explainable Model of Long Time Series by Extracting Informative Structured Causal Patterns [22.54910673667678]
We propose EXCAP, a unified framework for interpretable time-series modeling.<n>We show that EXCAP provides smooth and stable explanations over time and is robust to perturbations in causal masks.<n>These results show that EXCAP offers a principled and scalable approach to interpretable modeling of long time series with relevance to high-stakes domains such as healthcare and finance.
arXiv Detail & Related papers (2025-12-01T08:33:33Z)
Transformers Provably Learn Chain-of-Thought Reasoning with Length Generalization [53.89723291716722]
A crucial question about AI reasoning is whether models can extrapolate learned reasoning patterns to solve harder tasks with longer chain-of-thought (CoT)<n>We mathematically prove how the algebraic structure of state-tracking problems governs the degree of extrapolation of the learned CoT.<n>We provide the first optimization guarantee that constant-depth transformers provably learn $mathsfNC1$-complete problems with CoT.
arXiv Detail & Related papers (2025-11-10T18:40:24Z)
Provable Benefit of Curriculum in Transformer Tree-Reasoning Post-Training [76.12556589212666]
We show that curriculum post-training avoids the exponential complexity bottleneck.<n>Under outcome-only reward signals, reinforcement learning finetuning achieves high accuracy with sample complexity.<n>We establish guarantees for test-time scaling, where curriculum-aware querying reduces both reward oracle calls and sampling cost from exponential to order.
arXiv Detail & Related papers (2025-11-10T18:29:54Z)
Explainable Chain-of-Thought Reasoning: An Empirical Analysis on State-Aware Reasoning Dynamics [69.00587226225232]
We introduce a state-aware transition framework that abstracts CoT trajectories into structured latent dynamics.<n>To characterize the global structure of reasoning, we model their progression as a Markov chain.<n>This abstraction supports a range of analyses, including semantic role identification, temporal pattern visualization, and consistency evaluation.
arXiv Detail & Related papers (2025-08-29T18:53:31Z)
A Survey on Latent Reasoning [100.54120559169735]
Large Language Models (LLMs) have demonstrated impressive reasoning capabilities.<n>CoT reasoning that verbalizes intermediate steps limits the model's expressive bandwidth.<n>Latent reasoning tackles this bottleneck by performing multi-step inference entirely in the model's continuous hidden state.
arXiv Detail & Related papers (2025-07-08T17:29:07Z)
From Chaos to Order: The Atomic Reasoner Framework for Fine-grained Reasoning in Large Language Models [46.02816479205161]
We present textbfAtomic Reasoner (textbfAR), a cognitive inference strategy that enables fine-grained reasoning.<n>AR decomposes the reasoning process into atomic cognitive units, employing a cognitive routing mechanism.<n>Results show AR's superior reasoning capabilities without the computational burden of exhaustive solution searches.
arXiv Detail & Related papers (2025-03-20T08:34:53Z)
Supporting Optimal Phase Space Reconstructions Using Neural Network Architecture for Time Series Modeling [68.8204255655161]
We propose an artificial neural network with a mechanism to implicitly learn the phase spaces properties. Our approach is either as competitive as or better than most state-of-the-art strategies.
arXiv Detail & Related papers (2020-06-19T21:04:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.