Related papers: Reflective Confidence: Correcting Reasoning Flaws via Online Self-Correction

Reflective Confidence: Correcting Reasoning Flaws via Online Self-Correction

URL: http://arxiv.org/abs/2512.18605v1
Date: Sun, 21 Dec 2025 05:35:07 GMT
Title: Reflective Confidence: Correcting Reasoning Flaws via Online Self-Correction
Authors: Qinglin Zeng, Jing Yang, Keze Wang,
Abstract summary: Large language models (LLMs) have achieved strong performance on complex reasoning tasks using techniques such as chain-of-thought and self-consistency.<n>We propose reflective confidence, a novel reasoning framework that transforms low-confidence signals from termination indicators into reflection triggers.<n> Experiments on mathematical reasoning benchmarks, including AIME 2025, demonstrate significant accuracy improvements over advanced early-stopping baselines at comparable computational cost.
Score: 14.164508061248775
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) have achieved strong performance on complex reasoning tasks using techniques such as chain-of-thought and self-consistency. However, ensemble-based approaches, especially self-consistency which relies on multiple reasoning trajectories, often incur substantial computational overhead. To improve efficiency, prior work has leveraged internal confidence signals, where early stopping strategies such as DeepConf reduce cost by terminating low-confidence trajectories. However, this strategy discards incomplete reasoning paths and wastes partial computation. We propose reflective confidence, a novel reasoning framework that transforms low-confidence signals from termination indicators into reflection triggers. When confidence falls below a threshold, instead of stopping generation, the model produces a reflection prompt to analyze the current reasoning state, identify potential errors, and continue generation along a corrected trajectory. Experiments on mathematical reasoning benchmarks, including AIME 2025, demonstrate significant accuracy improvements over advanced early-stopping baselines at comparable computational cost, validating the effectiveness of proactive self-correction over passive discarding.

Related papers

Recursive Think-Answer Process for LLMs and VLMs [54.52289112197118]
We propose an efficient Recursive Think-Answer Process (R-TAP)<n>R-TAP enables models to engage in iterative reasoning cycles and generate more accurate answers.<n>We show that R-TAP-enhanced models consistently outperform conventional single-pass methods.
arXiv Detail & Related papers (2026-03-02T17:20:10Z)
Thinking by Subtraction: Confidence-Driven Contrastive Decoding for LLM Reasoning [58.331709210563616]
Thinking by Subtraction is a confidence-driven contrastive decoding approach.<n>A small subset of low-confidence tokens disproportionately contributes to reasoning errors and unnecessary output expansion.<n>Our method, Confidence-Driven Contrastive Decoding, detects low-confidence tokens during decoding and intervenes at these positions.
arXiv Detail & Related papers (2026-02-20T14:13:22Z)
ThinkRouter: Efficient Reasoning via Routing Thinking between Latent and Discrete Spaces [39.09794443825156]
latent reasoning improves reasoning efficiency by replacing explicit reasoning trajectories with continuous representations in a latent space.<n>We show that thinking trajectories ending in incorrect answers contain fewer low-confidence steps than those ending in correct answers.<n>We propose Think, an inference-time confidence-aware routing mechanism, to avoid high confidence and noise for efficient reasoning.
arXiv Detail & Related papers (2026-02-12T08:01:01Z)
Addressing Overthinking in Large Vision-Language Models via Gated Perception-Reasoning Optimization [56.59356959631999]
Gated Perception-Reasoning Optimization (GPRO) is a meta-reasoning controller that dynamically routes computation among three decision paths.<n>GPRO substantially improves both accuracy and efficiency, outperforming recent slow-thinking methods.
arXiv Detail & Related papers (2026-01-07T23:05:17Z)
Efficient Thought Space Exploration through Strategic Intervention [54.35208611253168]
We propose a novel Hint-Practice Reasoning (HPR) framework that operationalizes this insight through two synergistic components.<n>The framework's core innovation lies in Distributional Inconsistency Reduction (DIR), which dynamically identifies intervention points.<n> Experiments across arithmetic and commonsense reasoning benchmarks demonstrate HPR's state-of-the-art efficiency-accuracy tradeoffs.
arXiv Detail & Related papers (2025-11-13T07:26:01Z)
A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning [31.861874030715953]
We provide the first theoretical framework for analyzing sampling-based test-time scaling methods.<n>We introduce RPC, a hybrid method that leverages our theoretical insights through two key components: Perplexity Consistency and Reasoning Pruning.<n>RPC achieves reasoning performance comparable to self-consistency while not only enhancing confidence reliability but also reducing sampling costs by 50%.
arXiv Detail & Related papers (2025-10-17T08:59:30Z)
Certainty-Guided Reasoning in Large Language Models: A Dynamic Thinking Budget Approach [0.15749416770494704]
We show that Certainty-Guided Reasoning (CGR) improves baseline accuracy while reducing token usage.<n>CGR can eliminate millions of tokens in aggregate, with tunable trade-offs between certainty thresholds and efficiency.<n>By integrating confidence into the reasoning process, CGR makes large reasoning language models more adaptive, trustworthy, and resource efficient.
arXiv Detail & Related papers (2025-09-09T14:57:15Z)
Lost at the Beginning of Reasoning [85.17612793300238]
We show that the first reasoning step exerts a disproportionately large influence on the final prediction.<n>We propose an efficient sampling strategy that leverages a reward model to identify and retain high-quality first reasoning steps.
arXiv Detail & Related papers (2025-06-27T09:53:57Z)
ConCISE: Confidence-guided Compression in Step-by-step Efficient Reasoning [64.93140713419561]
Large Reasoning Models (LRMs) perform strongly in complex reasoning tasks via Chain-of-Thought (CoT) prompting, but often suffer from verbose outputs.<n>Existing fine-tuning-based compression methods either operate post-hoc pruning, risking disruption to reasoning coherence, or rely on sampling-based selection.<n>We introduce ConCISE, a framework designed to generate concise reasoning chains, integrating Confidence Injection to boost reasoning confidence, and Early Stopping to terminate reasoning when confidence is sufficient.
arXiv Detail & Related papers (2025-05-08T01:40:40Z)
Enhancing LLM Reliability via Explicit Knowledge Boundary Modeling [41.19330514054401]
Large language models (LLMs) are prone to hallucination stemming from misaligned self-awareness.<n>We propose the Explicit Knowledge Boundary Modeling framework to integrate fast and slow reasoning systems to harmonize reliability and usability.
arXiv Detail & Related papers (2025-03-04T03:16:02Z)
Bridging Internal Probability and Self-Consistency for Effective and Efficient LLM Reasoning [53.25336975467293]
We present the first theoretical error decomposition analysis of methods such as perplexity and self-consistency.<n>Our analysis reveals a fundamental trade-off: perplexity methods suffer from substantial model error due to the absence of a proper consistency function.<n>We propose Reasoning-Pruning Perplexity Consistency (RPC), which integrates perplexity with self-consistency, and Reasoning Pruning, which eliminates low-probability reasoning paths.
arXiv Detail & Related papers (2025-02-01T18:09:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.