Related papers: Reasoning Theater: Disentangling Model Beliefs from Chain-of-Thought

Reasoning Theater: Disentangling Model Beliefs from Chain-of-Thought

URL: http://arxiv.org/abs/2603.05488v1
Date: Thu, 05 Mar 2026 18:55:16 GMT
Title: Reasoning Theater: Disentangling Model Beliefs from Chain-of-Thought
Authors: Siddharth Boppana, Annabel Ma, Max Loeffler, Raphael Sarfati, Eric Bigelow, Atticus Geiger, Owen Lewis, Jack Merullo,
Abstract summary: We provide evidence of performative chain-of-thought (CoT) in reasoning models.<n>We compare activation probing, early forced answering, and a CoT monitor across two large models.<n>We contrast this with genuine reasoning in difficult multihop GPQA-Diamond questions.
Score: 11.955186033088351
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We provide evidence of performative chain-of-thought (CoT) in reasoning models, where a model becomes strongly confident in its final answer, but continues generating tokens without revealing its internal belief. Our analysis compares activation probing, early forced answering, and a CoT monitor across two large models (DeepSeek-R1 671B & GPT-OSS 120B) and find task difficulty-specific differences: The model's final answer is decodable from activations far earlier in CoT than a monitor is able to say, especially for easy recall-based MMLU questions. We contrast this with genuine reasoning in difficult multihop GPQA-Diamond questions. Despite this, inflection points (e.g., backtracking, 'aha' moments) occur almost exclusively in responses where probes show large belief shifts, suggesting these behaviors track genuine uncertainty rather than learned "reasoning theater." Finally, probe-guided early exit reduces tokens by up to 80% on MMLU and 30% on GPQA-Diamond with similar accuracy, positioning attention probing as an efficient tool for detecting performative reasoning and enabling adaptive computation.

Related papers

Decoding Answers Before Chain-of-Thought: Evidence from Pre-CoT Probes and Activation Steering [5.427346259545067]
Chain-of-thought (CoT) has become central to scaling reasoning capabilities in large language models.<n>We show that instruction-tuned models often determine their answer before generating CoT.
arXiv Detail & Related papers (2026-03-02T04:33:55Z)
Balancing Faithfulness and Performance in Reasoning via Multi-Listener Soft Execution [79.98699884805636]
Reasoning Execution by Multiple Listeners (REMUL) is a multi-party reinforcement learning approach.<n>REMUL builds on the hypothesis that reasoning traces which other parties can follow will be more faithful.<n>Speakers are rewarded for producing reasoning that is clear to listeners.
arXiv Detail & Related papers (2026-02-18T02:55:55Z)
Probing the Trajectories of Reasoning Traces in Large Language Models [4.599673637363014]
We propose a protocol to probe the trajectories of reasoning traces in large language models.<n>We find that accuracy and decision commitment consistently increase as the percentage of provided reasoning tokens grows.<n>We show that trajectory probing provides diagnostics for efficient and safer deployment of reasoning models.
arXiv Detail & Related papers (2026-01-30T16:45:16Z)
Thinking Traps in Long Chain-of-Thought: A Measurable Study and Trap-Aware Adaptive Restart [27.904791075662896]
We introduce TAAR (Trap-Aware Adaptive Restart), a test-time control framework that trains a diagnostic policy to predict two signals from partial trajectories.<n>At inference time, TAAR truncates the trajectory before the predicted trap segment and adaptively restarts decoding.<n>Experiments show that TAAR improves reasoning performance without fine-tuning base model parameters.
arXiv Detail & Related papers (2026-01-17T07:26:02Z)
One Token Embedding Is Enough to Deadlock Your Large Reasoning Model [91.48868589442837]
We present the Deadlock Attack, a resource exhaustion method that hijacks an LRM's generative control flow.<n>Our method achieves a 100% attack success rate across four advanced LRMs.
arXiv Detail & Related papers (2025-10-12T07:42:57Z)
Answer-Consistent Chain-of-thought Reinforcement Learning For Multi-modal Large Langauge Models [33.398631680508814]
We propose Answer-Consistent Reinforcement Learning that modifies the GRPO algorithm with an auxiliary consistency check.<n>We design a consistency-verification reward that grants a high reward only if both the original and the post-shuffle answers agree and are correct.<n>We evaluate ACRE on challenging Video Reasoning benchmarks and multimodal math reasoning benchmarks, achieving an average 2.2% and 1.5% improvement.
arXiv Detail & Related papers (2025-10-11T08:32:52Z)
From Harm to Help: Turning Reasoning In-Context Demos into Assets for Reasoning LMs [58.02809208460186]
We revisit this paradox using high-quality traces from DeepSeek-R1 as demonstrations.<n>We find that adding more exemplars consistently degrades accuracy, even when demonstrations are optimal.<n>We introduce Insight-to-solve (I2S), a sequential test-time procedure that turns demonstrations into explicit, reusable insights.
arXiv Detail & Related papers (2025-09-27T08:59:31Z)
VeriThinker: Learning to Verify Makes Reasoning Model Efficient [52.74493506816969]
Large Reasoning Models excel at complex tasks using Chain-of-Thought (CoT) reasoning.<n>Their tendency to overthinking leads to unnecessarily lengthy reasoning chains.<n>We introduce VeriThinker, a novel approach for CoT compression.
arXiv Detail & Related papers (2025-05-23T14:17:56Z)
Beyond the Last Answer: Your Reasoning Trace Uncovers More than You Think [51.0691253204425]
We analyze intermediate reasoning steps, termed subthoughts, to answer two questions: Does the final answer reliably represent the model's optimal conclusion?<n>Our approach involves segmenting a reasoning trace into sequential subthoughts based on linguistic cues.<n>We find that aggregating these answers by selecting the most frequent one (the mode) often yields significantly higher accuracy compared to relying solely on the answer derived from the original complete trace.
arXiv Detail & Related papers (2025-04-29T12:39:07Z)
Dynamic Early Exit in Reasoning Models [21.30793518631921]
Overthinking in long chain-of-thought (CoT) generation slows down the efficiency of problem solving, but also risks accuracy loss.<n>We propose a simple yet effective method that allows LLMs to self-truncate CoT sequences by early exit during generation.<n>Our method requires no additional training and can be seamlessly integrated into existing o1-like reasoning LLMs.
arXiv Detail & Related papers (2025-04-22T13:36:53Z)
CoT-Valve: Length-Compressible Chain-of-Thought Tuning [50.196317781229496]
We introduce a new tuning and inference strategy named CoT-Valve, designed to allow models to generate reasoning chains of varying lengths.<n>We show that CoT-Valve successfully enables controllability and compressibility of the chain and shows better performance than the prompt-based control.
arXiv Detail & Related papers (2025-02-13T18:52:36Z)
Preemptive Answer "Attacks" on Chain-of-Thought Reasoning [7.233752893356647]
Large language models (LLMs) showcase impressive reasoning capabilities when coupled with Chain-of-Thought prompting. In this paper, we introduce a novel scenario termed preemptive answers, where the LLM obtains an answer before engaging in reasoning. Experiments reveal that preemptive answers significantly impair the model's reasoning capability across various CoT methods and a broad spectrum of datasets.
arXiv Detail & Related papers (2024-05-31T15:15:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.