Efficient Reasoning Through Suppression of Self-Affirmation Reflections in Large Reasoning Models
- URL: http://arxiv.org/abs/2506.12353v1
- Date: Sat, 14 Jun 2025 05:30:09 GMT
- Title: Efficient Reasoning Through Suppression of Self-Affirmation Reflections in Large Reasoning Models
- Authors: Kaiyuan Liu, Chen Shen, Zhanwei Zhang, Junjie Liu, Xiaosong Yuan, Jieping ye,
- Abstract summary: Self-affirmation reflections are redundant reflective steps that affirm prior content and often occur after the already correct reasoning steps.<n>We show that suppressing self-affirmation reflections reduces output length without degrading accuracy across multiple models.<n>We also improve current train-based method by explicitly suppressing such reflections.
- Score: 29.615519143908998
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While recent advances in large reasoning models have demonstrated remarkable performance, efficient reasoning remains critical due to the rapid growth of output length. Existing optimization approaches highlights a tendency toward "overthinking", yet lack fine-grained analysis. In this work, we focus on Self-Affirmation Reflections: redundant reflective steps that affirm prior content and often occurs after the already correct reasoning steps. Observations of both original and optimized reasoning models reveal pervasive self-affirmation reflections. Notably, these reflections sometimes lead to longer outputs in optimized models than their original counterparts. Through detailed analysis, we uncover an intriguing pattern: compared to other reflections, the leading words (i.e., the first word of sentences) in self-affirmation reflections exhibit a distinct probability bias. Motivated by this insight, we can locate self-affirmation reflections and conduct a train-free experiment demonstrating that suppressing self-affirmation reflections reduces output length without degrading accuracy across multiple models (R1-Distill-Models, QwQ-32B, and Qwen3-32B). Furthermore, we also improve current train-based method by explicitly suppressing such reflections. In our experiments, we achieve length compression of 18.7\% in train-free settings and 50.2\% in train-based settings for R1-Distill-Qwen-1.5B. Moreover, our improvements are simple yet practical and can be directly applied to existing inference frameworks, such as vLLM. We believe that our findings will provide community insights for achieving more precise length compression and step-level efficient reasoning.
Related papers
- Lost at the Beginning of Reasoning [82.18834329384514]
We show that the first reasoning step exerts a disproportionately large influence on the final prediction.<n>We propose an efficient sampling strategy that leverages a reward model to identify and retain high-quality first reasoning steps.<n>We introduce a new benchmark specifically constructed with deliberately flawed first reasoning steps to systematically evaluate model self-correction capabilities.
arXiv Detail & Related papers (2025-06-27T09:53:57Z) - From Emergence to Control: Probing and Modulating Self-Reflection in Language Models [23.176641726866105]
Self-reflection is a powerful behavior enabled by reinforcement learning with verifiable rewards.<n>We show that self-reflection is not exclusive to fine-tuned models.
arXiv Detail & Related papers (2025-06-13T20:40:13Z) - Does Thinking More always Help? Understanding Test-Time Scaling in Reasoning Models [103.03315678501546]
Extending thinking traces using prompts like "Wait" or "Let me rethink" can improve performance.<n>This raises a natural question: Does thinking more at test-time truly lead to better reasoning?<n>We show a consistent pattern of initial performance improvements from additional thinking followed by a decline, due to "overthinking"
arXiv Detail & Related papers (2025-06-04T17:55:09Z) - Don't Think Longer, Think Wisely: Optimizing Thinking Dynamics for Large Reasoning Models [68.96619605651155]
Large reasoning models (LRMs) may drastically increase the output length due to overthinking.<n>We propose a dynamic optimization framework that segments model-generated reasoning paths into distinct thinking patterns.<n>Our method achieves up to a 12% accuracy improvement and reducing token usage from approximately 5,000 to 3,000 tokens.
arXiv Detail & Related papers (2025-05-27T20:59:29Z) - ReflectEvo: Improving Meta Introspection of Small LLMs by Learning Self-Reflection [60.75785864719726]
We present a novel pipeline, ReflectEvo, to demonstrate that small language models (SLMs) can enhance meta introspection through reflection learning.<n>We construct ReflectEvo-460k, a large-scale, comprehensive, self-generated reflection dataset with broadened instructions and diverse multi-domain tasks.
arXiv Detail & Related papers (2025-05-22T10:03:05Z) - SEAL: Steerable Reasoning Calibration of Large Language Models for Free [58.190800043449336]
Large Language Models (LLMs) have demonstrated compelling capabilities for complex reasoning tasks via the extended chain-of-thought (CoT) reasoning mechanism.<n>Recent studies reveal substantial redundancy in the CoT reasoning traces, which negatively impacts model performance.<n>We introduce SEAL, a training-free approach that seamlessly calibrates the CoT process, improving accuracy while demonstrating significant efficiency gains.
arXiv Detail & Related papers (2025-04-07T02:42:07Z) - Retro-Search: Exploring Untaken Paths for Deeper and Efficient Reasoning [84.2749507577386]
We introduce Retro-Search, an MCTS-inspired search algorithm, for distilling higher quality reasoning paths from large models.<n>Retro-Search retrospectively revises reasoning paths to discover better, yet shorter traces, which can lead to student models with enhanced reasoning capabilities.<n>Our approach can enable two use cases: self-improvement, where models are fine-tuned on their own Retro-Search-ed traces, and weak-to-strong improvement.
arXiv Detail & Related papers (2025-04-06T06:23:27Z) - Two Heads Are Better Than One: Dual-Model Verbal Reflection at Inference-Time [17.3254565018168]
Large Language Models (LLMs) often struggle with complex reasoning scenarios.<n>We introduce a contrastive reflection synthesis pipeline that enhances the accuracy and depth of LLM-generated reflections.<n>We propose a dual-model reasoning framework within a verbal reinforcement learning paradigm.
arXiv Detail & Related papers (2025-02-26T15:41:41Z) - Vision-Language Models Can Self-Improve Reasoning via Reflection [20.196406628954303]
Chain-of-thought (CoT) has proven to improve the reasoning capability of large language models (LLMs)
We propose a self-training framework, R3V, which iteratively enhances the model's Vision-language Reasoning by Reflecting on CoT Rationales.
Our approach supports self-reflection on generated solutions, further boosting performance through test-time computation.
arXiv Detail & Related papers (2024-10-30T14:45:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.