Related papers: First Try Matters: Revisiting the Role of Reflection in Reasoning Models

First Try Matters: Revisiting the Role of Reflection in Reasoning Models

URL: http://arxiv.org/abs/2510.08308v1
Date: Thu, 09 Oct 2025 14:57:10 GMT
Title: First Try Matters: Revisiting the Role of Reflection in Reasoning Models
Authors: Liwei Kang, Yue Deng, Yao Xiao, Zhanfeng Mo, Wee Sun Lee, Lidong Bing,
Abstract summary: We focus on reflective behaviours where the model has already produced an answer but continues reflecting before finalizing its output.<n>Our analysis reveals that reflections are predominantly confirmatory and rarely alter the model's initial answer.<n>We propose a question-aware early-stopping method that enhances inference-time token efficiency by stopping the reasoning process once a few plausible candidate answers are generated.
Score: 66.39546876232512
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models have recently demonstrated significant gains in reasoning ability, often attributed to their capacity to generate longer chains of thought and engage in reflective reasoning. However, the contribution of reflections to performance improvement remains unclear. In this paper, we systematically analyze the rollouts of eight reasoning models on five mathematical datasets. We focus on reflective behaviours where the model has already produced an answer but continues reflecting before finalizing its output. Our analysis reveals that reflections are predominantly confirmatory and rarely alter the model's initial answer, a pattern consistent across models and datasets. To understand the role of reflections in training, we construct supervised fine-tuning (SFT) datasets with varying amounts of reflection steps. We observe that training models on rollouts with more reflection steps primarily enhances first-answer correctness rather than the ability to correct initially wrong answers through reflections. This motivates us to propose a question-aware early-stopping method that enhances inference-time token efficiency by stopping the reasoning process once a few plausible candidate answers are generated, thereby reducing unnecessary reflection steps. Motivated by this, we further propose to dynamically truncate the reflections after a candidate answer has appeared during generation, which reduces reasoning tokens by 24.5% across five mathematical datasets, within a 2.9% drop in accuracy.

Related papers

Recursive Think-Answer Process for LLMs and VLMs [54.52289112197118]
We propose an efficient Recursive Think-Answer Process (R-TAP)<n>R-TAP enables models to engage in iterative reasoning cycles and generate more accurate answers.<n>We show that R-TAP-enhanced models consistently outperform conventional single-pass methods.
arXiv Detail & Related papers (2026-03-02T17:20:10Z)
Learning from Trials and Errors: Reflective Test-Time Planning for Embodied LLMs [63.88783817420284]
Embodied robots cannot reflect on what went wrong or why, turning deployment into a sequence of independent trials.<n>We introduce Reflective Test-Time Planning, which integrates two modes of reflection: textitreflection-in-action and textitreflection-on-action<n>We also include retrospective reflection, allowing the agent to re-evaluate earlier decisions and perform model updates with hindsight.
arXiv Detail & Related papers (2026-02-24T18:55:18Z)
Teaching Large Reasoning Models Effective Reflection [62.73646680747003]
Large Reasoning Models (LRMs) have recently shown impressive performance on complex reasoning tasks.<n>However, not all reflections are beneficial-many are superficial, offering little to no improvement over the original answer.<n>We first propose Self-Critique Fine-Tuning (SCFT), a training framework that enhances the model's reflective reasoning ability using only self-generated critiques.
arXiv Detail & Related papers (2026-01-19T04:51:53Z)
ReflCtrl: Controlling LLM Reflection via Representation Engineering [6.828302913581854]
We study self-reflection through the lens of representation engineering.<n>We propose a stepwise steering method that can control reflection frequency.<n>In experiments, we can save up to 33.6 percent of reasoning tokens while preserving performance.
arXiv Detail & Related papers (2025-12-16T00:38:34Z)
Efficient Reasoning Through Suppression of Self-Affirmation Reflections in Large Reasoning Models [29.615519143908998]
Self-affirmation reflections are redundant reflective steps that affirm prior content and often occur after the already correct reasoning steps.<n>We show that suppressing self-affirmation reflections reduces output length without degrading accuracy across multiple models.<n>We also improve current train-based method by explicitly suppressing such reflections.
arXiv Detail & Related papers (2025-06-14T05:30:09Z)
From Emergence to Control: Probing and Modulating Self-Reflection in Language Models [23.176641726866105]
Self-reflection is a powerful behavior enabled by reinforcement learning with verifiable rewards.<n>We show that self-reflection is not exclusive to fine-tuned models.
arXiv Detail & Related papers (2025-06-13T20:40:13Z)
Revisiting Overthinking in Long Chain-of-Thought from the Perspective of Self-Doubt [74.35891434097053]
Reasoning Large Language Models (RLLMs) have demonstrated impressive performance on complex tasks.<n>They often exhibit overthinking -- performing unnecessary reasoning steps even after arriving at the correct answer.<n>We present a quantitative analysis of overthinking from the perspective of self-doubt.<n>We introduce a simple and effective prompting method to reduce the model's over-reliance on input questions.
arXiv Detail & Related papers (2025-05-29T14:30:02Z)
ReflectEvo: Improving Meta Introspection of Small LLMs by Learning Self-Reflection [60.75785864719726]
We present a novel pipeline, ReflectEvo, to demonstrate that small language models (SLMs) can enhance meta introspection through reflection learning.<n>We construct ReflectEvo-460k, a large-scale, comprehensive, self-generated reflection dataset with broadened instructions and diverse multi-domain tasks.
arXiv Detail & Related papers (2025-05-22T10:03:05Z)
Instruct-of-Reflection: Enhancing Large Language Models Iterative Reflection Capabilities via Dynamic-Meta Instruction [11.838351314880736]
Instruct-of-Reflection (IoRT) is a novel and general reflection framework that leverages dynamic-meta instruction to enhance the iterative reflection capability of Large Language Models (LLMs)<n>Our experiments demonstrate that IoRT achieves an average improvement of 10.1% over established baselines in mathematical and commonsense reasoning tasks.
arXiv Detail & Related papers (2025-03-02T14:02:03Z)
Two Heads Are Better Than One: Dual-Model Verbal Reflection at Inference-Time [22.397964526411812]
We propose a Dual-model Reflective Scoring framework featuring a dedicated Critic model trained for effective reflection.<n>DARS achieves strong performance and consistently outperforms existing ASAS baselines across all evaluation metrics.
arXiv Detail & Related papers (2025-02-26T15:41:41Z)
Reverse Thinking Makes LLMs Stronger Reasoners [90.42357659849215]
RevThink is a framework composed of data augmentation and learning objectives.<n> Experiments across 12 datasets show an average 13.53% improvement over the student model's zero-shot performance.<n>RevThink also exhibits strong generalization to out-of-distribution held-out datasets.
arXiv Detail & Related papers (2024-11-29T17:27:05Z)
Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision [120.40788744292739]
We propose a two-player paradigm that separates the roles of reasoning and critique models. We first propose AutoMathCritique, an automated and scalable framework for collecting critique data. We demonstrate that the critique models consistently improve the actor's performance on difficult queries at test-time.
arXiv Detail & Related papers (2024-11-25T17:11:54Z)
Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models [36.119299938503936]
Large vision-language models (LVLMs) have shown promising performance on a variety of vision-language tasks. They remain susceptible to hallucinations, generating outputs misaligned with visual content or instructions. We propose reflective instruction tuning, which integrates rationale learning into visual instruction tuning.
arXiv Detail & Related papers (2024-07-16T06:32:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.