Related papers: Entropy-Guided Loop: Achieving Reasoning through Uncertainty-Aware Generation

Entropy-Guided Loop: Achieving Reasoning through Uncertainty-Aware Generation

URL: http://arxiv.org/abs/2509.00079v1
Date: Tue, 26 Aug 2025 22:29:12 GMT
Title: Entropy-Guided Loop: Achieving Reasoning through Uncertainty-Aware Generation
Authors: Andrew G. A. Correa, Ana C. H de Matos,
Abstract summary: entropy-guided refinement is a lightweight, test-time loop that uses token-level uncertainty to trigger a single, targeted refinement pass.<n>We demonstrate that this uncertainty-aware loop provides an effective middle ground between single-pass inference and expensive reasoning chains.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Reasoning models often outperform smaller models but at 3--5$\times$ higher cost and added latency. We present entropy-guided refinement: a lightweight, test-time loop that uses token-level uncertainty to trigger a single, targeted refinement pass. We extract logprobs, compute Shannon entropy on top-$k$ alternatives, and apply a simple OR-logic trigger over perplexity, maximum token entropy, and low-confidence-token count. Unlike approaches that use entropy only for measurement or decoding, we pass a compact uncertainty report (tokens, confidences, alternatives, context) back to the model to guide corrective edits. On representative technical queries across reasoning, mathematics, and code generation tasks, a small model with our loop approaches 95\% of a reference reasoning model's quality at approximately one-third of the cost. The method achieves selective refinement on ~31\% of responses while improving accuracy by 16 percentage points over single-pass inference. We demonstrate that this uncertainty-aware loop provides an effective middle ground between single-pass inference and expensive reasoning chains, making it practical for production deployments where both quality and cost matter.

Related papers

ODAR: Principled Adaptive Routing for LLM Reasoning via Active Inference [60.958331943869126]
ODAR-Expert is an adaptive routing framework that optimize the accuracy-efficiency trade-off via principled resource allocation.<n>We show strong and consistent gains, including 98.2% accuracy on MATH and 54.8% on Humanity's Last Exam.
arXiv Detail & Related papers (2026-02-27T05:22:01Z)
Towards Anytime-Valid Statistical Watermarking [63.02116925616554]
We develop the first e-value-based watermarking framework, Anchored E-Watermarking, that unifies optimal sampling with anytime-valid inference.<n>Our framework can significantly enhance sample efficiency, reducing the average token budget required for detection by 13-15% relative to state-of-the-art baselines.
arXiv Detail & Related papers (2026-02-19T18:32:26Z)
EntroCut: Entropy-Guided Adaptive Truncation for Efficient Chain-of-Thought Reasoning in Small-scale Large Reasoning Models [42.49934375597466]
Large Reasoning Models (LRMs) excel at complex reasoning tasks through extended chain-of-thought generation.<n>We find that the entropy of the model's output distribution in early reasoning steps reliably distinguishes correct from incorrect reasoning.<n>We propose EntroCut, a training-free method that dynamically truncates reasoning by identifying high-confidence states.
arXiv Detail & Related papers (2026-01-30T06:19:16Z)
GlimpRouter: Efficient Collaborative Inference by Glimpsing One Token of Thoughts [10.808072653940263]
Collaborative inference offers a promising solution by selectively allocating work between lightweight and large models.<n>We propose a novel perspective on step-wise collaboration: the difficulty of a reasoning step can be inferred from its very first token.<n>Glimp employs a lightweight model to generate only the first token of each reasoning step and routes the step to a larger model only when the initial token entropy exceeds a threshold.
arXiv Detail & Related papers (2026-01-08T16:58:07Z)
Arbitrage: Efficient Reasoning via Advantage-Aware Speculation [71.45710345765528]
Speculative Decoding accelerates inference by employing a fast but inaccurate draft model to autoregressively propose tokens.<n>But due to unnecessary rejections caused by token mismatches in semantically equivalent steps, traditional token-level Speculative Decoding struggles in reasoning tasks.<n>We propose Arbitrage, a novel step-level speculative generation framework that routes generation dynamically based on the relative advantage between draft and target models.
arXiv Detail & Related papers (2025-12-04T17:50:53Z)
DiffAdapt: Difficulty-Adaptive Reasoning for Token-Efficient LLM Inference [68.05879215304641]
Recent reasoning Large Language Models (LLMs) demonstrate remarkable problem-solving abilities but often generate long thinking traces whose utility is unclear.<n>Our work aims to improve their efficiency, enabling them to reach high performance without overthinking.<n>We introduce textbfDiffAdapt, a lightweight framework that selects Easy/Normal/Hard inference strategies per question based on their difficulty and reasoning trace entropy.
arXiv Detail & Related papers (2025-10-22T15:16:06Z)
Think Just Enough: Sequence-Level Entropy as a Confidence Signal for LLM Reasoning [5.37133760455631]
We introduce a novel entropy-based framework to drive token efficiency in large language models during reasoning tasks.<n>Our approach uses Shannon entropy from token-level logprobs as a confidence signal to enable early stopping.<n>We show that entropy-based confidence calibration represents an emergent property of advanced post-training optimization.
arXiv Detail & Related papers (2025-10-09T12:33:16Z)
LLMs are Bayesian, in Expectation, not in Realization [0.0]
Large language models adapt to new tasks without parameter updates.<n>Recent empirical findings reveal a fundamental contradiction: transformers systematically violate the martingale property.<n>This violation challenges the theoretical foundations underlying uncertainty quantification in critical applications.
arXiv Detail & Related papers (2025-07-15T22:20:11Z)
Self-Route: Automatic Mode Switching via Capability Estimation for Efficient Reasoning [36.470695895695044]
Self-Route is a dynamic reasoning framework that automatically selects between general and reasoning modes.<n>We show that Self-Route achieves comparable accuracy to reasoning models while reducing token consumption by 30-55%.
arXiv Detail & Related papers (2025-05-27T03:18:31Z)
Fractured Chain-of-Thought Reasoning [61.647243580650446]
We introduce Fractured Sampling, a unified inference-time strategy that interpolates between full CoT and solution-only sampling.<n>We show that Fractured Sampling consistently achieves superior accuracy-cost trade-offs, yielding steep log-linear scaling gains in Pass@k versus token budget.
arXiv Detail & Related papers (2025-05-19T11:30:41Z)
O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning [98.3430004984531]
We propose Length-Harmonizing Fine-Tuning (O1-Pruner) to minimize reasoning overhead while maintaining accuracy.<n>Our code is coming soon at https://github.com/StarDewXXX/O1-Pruner.
arXiv Detail & Related papers (2025-01-22T01:35:11Z)
Improved Convergence of Score-Based Diffusion Models via Prediction-Correction [15.772322871598085]
Score-based generative models (SGMs) are powerful tools to sample from complex data distributions. This paper addresses the issue by considering a version of the popular predictor-corrector scheme. We first estimate the final distribution via an inexact Langevin dynamics and then revert the process.
arXiv Detail & Related papers (2023-05-23T15:29:09Z)
Self-Evaluation Guided Beam Search for Reasoning [61.523627290397556]
We introduce a stepwise self-evaluation mechanism to guide and calibrate the reasoning process of Large Language Model (LLM) We propose a decoding algorithm integrating the self-evaluation guidance via beam search. Our approach surpasses the corresponding Codex-backboned baselines in few-shot accuracy by $6.34%$, $9.56%$, and $5.46%$ on the GSM8K, AQuA, and StrategyQA.
arXiv Detail & Related papers (2023-05-01T02:37:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.