Diagnosing Pathological Chain-of-Thought in Reasoning Models
- URL: http://arxiv.org/abs/2602.13904v1
- Date: Sat, 14 Feb 2026 21:53:47 GMT
- Title: Diagnosing Pathological Chain-of-Thought in Reasoning Models
- Authors: Manqing Liu, David Williams-King, Ida Caspary, Linh Le, Hannes Whittingham, Puria Radmard, Cameron Tice, Edward James Young,
- Abstract summary: Chain-of-thought (CoT) reasoning is fundamental to modern LLM architectures.<n>We identify three distinct pathologies: post-hoc rationalization, encoded reasoning, and internalized reasoning.<n>Our work provides a practical toolkit for assessing CoT pathologies, with direct implications for training-time monitoring.
- Score: 2.8521161475937675
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Chain-of-thought (CoT) reasoning is fundamental to modern LLM architectures and represents a critical intervention point for AI safety. However, CoT reasoning may exhibit failure modes that we note as pathologies, which prevent it from being useful for monitoring. Prior work has identified three distinct pathologies: post-hoc rationalization, where models generate plausible explanations backwards from predetermined answers; encoded reasoning, where intermediate steps conceal information within seemingly interpretable text; and internalized reasoning, where models replace explicit reasoning with meaningless filler tokens while computing internally. To better understand and discriminate between these pathologies, we create a set of concrete metrics that are simple to implement, computationally inexpensive, and task-agnostic. To validate our approach, we develop model organisms deliberately trained to exhibit specific CoT pathologies. Our work provides a practical toolkit for assessing CoT pathologies, with direct implications for training-time monitoring.
Related papers
- How Well Do Multimodal Models Reason on ECG Signals? [36.281141199783825]
We introduce a reproducible framework for evaluating reasoning in ECG signals.<n>We employ an agentic framework that generates code to empirically verify the temporal structures described in the reasoning trace.<n>This dual-verification method enables the scalable assessment of "true" reasoning capabilities.
arXiv Detail & Related papers (2026-02-27T21:04:12Z) - PathReasoner-R1: Instilling Structured Reasoning into Pathology Vision-Language Model via Knowledge-Guided Policy Optimization [6.821738567680833]
We construct PathReasoner, the first large-scale dataset of whole-slide image (WSI) reasoning.<n>PathReasoner-R1 synergizes supervised fine-tuning with reasoning-oriented reinforcement learning to instill structured chain-of-thought capabilities.<n>Experiments demonstrate that PathReasoner-R1 achieves state-of-the-art performance on both PathReasoner and public benchmarks across various image scales.
arXiv Detail & Related papers (2026-01-29T12:21:16Z) - Schoenfeld's Anatomy of Mathematical Reasoning by Language Models [56.656180566692946]
We adopt Schoenfeld's Episode Theory as an inductive, intermediate-scale lens and introduce ThinkARM (Anatomy of Reasoning in Models)<n>ThinkARM explicitly abstracts reasoning traces into functional reasoning steps such as Analysis, Explore, Implement, verify, etc.<n>We show that episode-level representations make reasoning steps explicit, enabling systematic analysis of how reasoning is structured, stabilized, and altered in modern language models.
arXiv Detail & Related papers (2025-12-23T02:44:25Z) - Adaptive Diagnostic Reasoning Framework for Pathology with Multimodal Large Language Models [34.28963665009494]
We present RECAP-PATH, an interpretable framework that establishes a self-learning paradigm.<n>It shifts off-the-shelf multimodal large language models from passive pattern recognition to evidence-linked diagnostic reasoning.<n>This self-learning approach requires only small labeled sets and no white-box access or weight updates to generate cancer diagnoses.
arXiv Detail & Related papers (2025-11-15T03:06:59Z) - CTRLS: Chain-of-Thought Reasoning via Latent State-Transition [57.51370433303236]
Chain-of-thought (CoT) reasoning enables large language models to break down complex problems into interpretable intermediate steps.<n>We introduce groundingS, a framework that formulates CoT reasoning as a Markov decision process (MDP) with latent state transitions.<n>We show improvements in reasoning accuracy, diversity, and exploration efficiency across benchmark reasoning tasks.
arXiv Detail & Related papers (2025-07-10T21:32:18Z) - Reasoning Model is Stubborn: Diagnosing Instruction Overriding in Reasoning Models [27.437685534830457]
Large language models frequently exhibit a problematic reliance on familiar reasoning patterns.<n>Despite explicit instructions from users, these models often override clearly stated conditions and default to habitual reasoning trajectories.<n>This behavior presents significant challenges, particularly in domains such as mathematics and logic puzzle.
arXiv Detail & Related papers (2025-05-22T19:00:01Z) - Information Science Principles of Machine Learning: A Causal Chain Meta-Framework Based on Formalized Information Mapping [7.299890614172539]
This study addresses key challenges in machine learning, namely the absence of a unified formal theoretical framework and the lack of foundational theories for model interpretability and ethical safety.<n>We first construct a formal information model, explicitly defining the ontological states and carrier mappings of typical machine learning stages.<n>By introducing learnable and processable predicates, as well as learning and processing functions, we analyze the causal chain logic and constraint laws governing machine learning processes.
arXiv Detail & Related papers (2025-05-19T14:39:41Z) - The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think [81.38614558541772]
We introduce the CoT Encyclopedia, a framework for analyzing and steering model reasoning.<n>Our method automatically extracts diverse reasoning criteria from model-generated CoTs.<n>We show that this framework produces more interpretable and comprehensive analyses than existing methods.
arXiv Detail & Related papers (2025-05-15T11:31:02Z) - Neural Causal Models for Counterfactual Identification and Estimation [62.30444687707919]
We study the evaluation of counterfactual statements through neural models.
First, we show that neural causal models (NCMs) are expressive enough.
Second, we develop an algorithm for simultaneously identifying and estimating counterfactual distributions.
arXiv Detail & Related papers (2022-09-30T18:29:09Z) - Logical Satisfiability of Counterfactuals for Faithful Explanations in
NLI [60.142926537264714]
We introduce the methodology of Faithfulness-through-Counterfactuals.
It generates a counterfactual hypothesis based on the logical predicates expressed in the explanation.
It then evaluates if the model's prediction on the counterfactual is consistent with that expressed logic.
arXiv Detail & Related papers (2022-05-25T03:40:59Z) - Structural Causal Models Are (Solvable by) Credal Networks [70.45873402967297]
Causal inferences can be obtained by standard algorithms for the updating of credal nets.
This contribution should be regarded as a systematic approach to represent structural causal models by credal networks.
Experiments show that approximate algorithms for credal networks can immediately be used to do causal inference in real-size problems.
arXiv Detail & Related papers (2020-08-02T11:19:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.