Related papers: Circular Reasoning: Understanding Self-Reinforcing Loops in Large Reasoning Models

Circular Reasoning: Understanding Self-Reinforcing Loops in Large Reasoning Models

URL: http://arxiv.org/abs/2601.05693v1
Date: Fri, 09 Jan 2026 10:23:55 GMT
Title: Circular Reasoning: Understanding Self-Reinforcing Loops in Large Reasoning Models
Authors: Zenghao Duan, Liang Pang, Zihao Wei, Wenbin Duan, Yuxin Tian, Shicheng Xu, Jingcheng Deng, Zhiyi Yin, Xueqi Cheng,
Abstract summary: Circular Reasoning is a self-reinforcing trap where generated content acts as a logical premise for its own recurrence.<n>Mechanistically, we characterize circular reasoning as a state collapse exhibiting distinct boundaries.<n>We reveal that reasoning impasses trigger the loop onset, which subsequently persists as an inescapable cycle driven by a self-reinforcing V-shaped attention mechanism.
Score: 66.11277323593475
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite the success of test-time scaling, Large Reasoning Models (LRMs) frequently encounter repetitive loops that lead to computational waste and inference failure. In this paper, we identify a distinct failure mode termed Circular Reasoning. Unlike traditional model degeneration, this phenomenon manifests as a self-reinforcing trap where generated content acts as a logical premise for its own recurrence, compelling the reiteration of preceding text. To systematically analyze this phenomenon, we introduce LoopBench, a dataset designed to capture two distinct loop typologies: numerical loops and statement loops. Mechanistically, we characterize circular reasoning as a state collapse exhibiting distinct boundaries, where semantic repetition precedes textual repetition. We reveal that reasoning impasses trigger the loop onset, which subsequently persists as an inescapable cycle driven by a self-reinforcing V-shaped attention mechanism. Guided by these findings, we employ the Cumulative Sum (CUSUM) algorithm to capture these precursors for early loop prediction. Experiments across diverse LRMs validate its accuracy and elucidate the stability of long-chain reasoning.

Related papers

Why Self-Training Helps and Hurts: Denoising vs. Signal Forgetting [6.369253528507392]
Iterative self-training repeatedly refits a model on pseudo-labels generated by its own predictions.<n>We derive deterministic-equivalent recursions for the prediction risk and effective noise across iterations.
arXiv Detail & Related papers (2026-02-15T07:28:12Z)
Is my model "mind blurting"? Interpreting the dynamics of reasoning tokens with Recurrence Quantification Analysis (RQA) [1.593065406609169]
We propose Recurrence Quantification Analysis (RQA) as a non-textual alternative for analysing model's reasoning chains at test time.<n>RQA captures signals not reflected by response length, but also substantially improves prediction of task complexity by 8%.
arXiv Detail & Related papers (2026-02-05T23:48:23Z)
APR: Penalizing Structural Redundancy in Large Reasoning Models via Anchor-based Process Rewards [61.52322047892064]
Test-Time Scaling (TTS) has significantly enhanced the capabilities of Large Reasoning Models (LRMs)<n>We observe that LRMs frequently conduct repetitive self-verification without revision even after obtaining the final answer during the reasoning process.<n>We propose Anchor-based Process Reward (APR), a structure-aware reward shaping method that localizes the reasoning anchor and penalizes exclusively the post-anchor AST.
arXiv Detail & Related papers (2026-01-31T14:53:20Z)
ReGuLaR: Variational Latent Reasoning Guided by Rendered Chain-of-Thought [49.203970812338916]
Explicit reasoning chains introduce substantial computational redundancy.<n>Recent latent reasoning methods attempt to mitigate this by compressing reasoning processes into latent space.<n>We propose Rendered CoT-Guided variational Latent Reasoning (ReGuLaR)
arXiv Detail & Related papers (2026-01-30T17:08:06Z)
Analyzing Reasoning Consistency in Large Multimodal Models under Cross-Modal Conflicts [74.47786985522762]
We identify a critical failure mode termed textual inertia, where models tend to blindly adhere to the erroneous text while neglecting conflicting visual evidence.<n>We propose the LogicGraph Perturbation Protocol that structurally injects perturbations into the reasoning chains of diverse LMMs.<n>Results reveal that models successfully self-correct in less than 10% of cases and predominantly succumb to blind textual error propagation.
arXiv Detail & Related papers (2026-01-07T16:39:34Z)
Rationale-Grounded In-Context Learning for Time Series Reasoning with Multimodal Large Language Models [39.75347938309383]
We propose the rationale-grounded in-context learning for time series reasoning, where rationales work as guiding reasoning units rather than post-hoc explanations.<n>We conduct extensive experiments to demonstrate the effectiveness and efficiency of our proposed RationaleTS on three-domain time series reasoning tasks.
arXiv Detail & Related papers (2026-01-06T12:27:04Z)
Lost at the Beginning of Reasoning [85.17612793300238]
We show that the first reasoning step exerts a disproportionately large influence on the final prediction.<n>We propose an efficient sampling strategy that leverages a reward model to identify and retain high-quality first reasoning steps.
arXiv Detail & Related papers (2025-06-27T09:53:57Z)
Interpreting the Repeated Token Phenomenon in Large Language Models [31.1226642501095]
Large Language Models (LLMs) often fail to accurately repeat a single word when prompted to, and instead output unrelated text.<n>We aim to explain the causes for this phenomenon and link it to the concept of attention sinks''<n>Our investigation identifies the neural circuit responsible for attention sinks and shows how long repetitions disrupt this circuit.
arXiv Detail & Related papers (2025-03-11T21:40:58Z)
Gumbel Counterfactual Generation From Language Models [64.55296662926919]
We show that counterfactual reasoning is conceptually distinct from interventions.<n>We propose a framework for generating true string counterfactuals.<n>We show that the approach produces meaningful counterfactuals while at the same time showing that commonly used intervention techniques have considerable undesired side effects.
arXiv Detail & Related papers (2024-11-11T17:57:30Z)
Causal Discovery in Semi-Stationary Time Series [32.424281626708336]
We propose a constraint-based, non-parametric algorithm for discovering causal relations in observational time series. We show that this algorithm is sound in identifying causal relations on discrete time series.
arXiv Detail & Related papers (2024-07-10T00:55:38Z)
Root Cause Identification for Collective Anomalies in Time Series given an Acyclic Summary Causal Graph with Loops [1.8416014644193066]
The paper first shows how the problem of root cause identification can be divided into many independent subproblems. Under this setting, some root causes can be found directly from the graph and from the time of appearance of anomalies. The rest of the root causes can be found by comparing direct effects in the normal and in the anomalous regime.
arXiv Detail & Related papers (2023-03-07T16:47:35Z)
DynGFN: Towards Bayesian Inference of Gene Regulatory Networks with GFlowNets [81.75973217676986]
Gene regulatory networks (GRN) describe interactions between genes and their products that control gene expression and cellular function. Existing methods either focus on challenge (1), identifying cyclic structure from dynamics, or on challenge (2) learning complex Bayesian posteriors over DAGs, but not both. In this paper we leverage the fact that it is possible to estimate the "velocity" of gene expression with RNA velocity techniques to develop an approach that addresses both challenges.
arXiv Detail & Related papers (2023-02-08T16:36:40Z)
Sequential Learning of the Topological Ordering for the Linear Non-Gaussian Acyclic Model with Parametric Noise [6.866717993664787]
We develop a novel sequential approach to estimate the causal ordering of a DAG. We provide extensive numerical evidence to demonstrate that our procedure is scalable to cases with possibly thousands of nodes.
arXiv Detail & Related papers (2022-02-03T18:15:48Z)
Consistency of a Recurrent Language Model With Respect to Incomplete Decoding [67.54760086239514]
We study the issue of receiving infinite-length sequences from a recurrent language model. We propose two remedies which address inconsistency: consistent variants of top-k and nucleus sampling, and a self-terminating recurrent language model.
arXiv Detail & Related papers (2020-02-06T19:56:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.