Efficient Thought Space Exploration through Strategic Intervention
- URL: http://arxiv.org/abs/2511.10038v1
- Date: Fri, 14 Nov 2025 01:27:41 GMT
- Title: Efficient Thought Space Exploration through Strategic Intervention
- Authors: Ziheng Li, Hengyi Cai, Xiaochi Wei, Yuchen Li, Shuaiqiang Wang, Zhi-Hong Deng, Dawei Yin,
- Abstract summary: We propose a novel Hint-Practice Reasoning (HPR) framework that operationalizes this insight through two synergistic components.<n>The framework's core innovation lies in Distributional Inconsistency Reduction (DIR), which dynamically identifies intervention points.<n> Experiments across arithmetic and commonsense reasoning benchmarks demonstrate HPR's state-of-the-art efficiency-accuracy tradeoffs.
- Score: 54.35208611253168
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While large language models (LLMs) demonstrate emerging reasoning capabilities, current inference-time expansion methods incur prohibitive computational costs by exhaustive sampling. Through analyzing decoding trajectories, we observe that most next-token predictions align well with the golden output, except for a few critical tokens that lead to deviations. Inspired by this phenomenon, we propose a novel Hint-Practice Reasoning (HPR) framework that operationalizes this insight through two synergistic components: 1) a hinter (powerful LLM) that provides probabilistic guidance at critical decision points, and 2) a practitioner (efficient smaller model) that executes major reasoning steps. The framework's core innovation lies in Distributional Inconsistency Reduction (DIR), a theoretically-grounded metric that dynamically identifies intervention points by quantifying the divergence between practitioner's reasoning trajectory and hinter's expected distribution in a tree-structured probabilistic space. Through iterative tree updates guided by DIR, HPR reweights promising reasoning paths while deprioritizing low-probability branches. Experiments across arithmetic and commonsense reasoning benchmarks demonstrate HPR's state-of-the-art efficiency-accuracy tradeoffs: it achieves comparable performance to self-consistency and MCTS baselines while decoding only 1/5 tokens, and outperforms existing methods by at most 5.1% absolute accuracy while maintaining similar or lower FLOPs.
Related papers
- Thinking by Subtraction: Confidence-Driven Contrastive Decoding for LLM Reasoning [58.331709210563616]
Thinking by Subtraction is a confidence-driven contrastive decoding approach.<n>A small subset of low-confidence tokens disproportionately contributes to reasoning errors and unnecessary output expansion.<n>Our method, Confidence-Driven Contrastive Decoding, detects low-confidence tokens during decoding and intervenes at these positions.
arXiv Detail & Related papers (2026-02-20T14:13:22Z) - ENTRA: Entropy-Based Redundancy Avoidance in Large Language Model Reasoning [30.786062954495403]
Large Reasoning Models (LRMs) often suffer from overthinking, generating unnecessarily long reasoning chains even for simple tasks.<n>We propose ENTRA, an entropy-based training framework that suppresses redundant reasoning while preserving performance.
arXiv Detail & Related papers (2026-01-12T01:26:30Z) - Reflective Confidence: Correcting Reasoning Flaws via Online Self-Correction [14.164508061248775]
Large language models (LLMs) have achieved strong performance on complex reasoning tasks using techniques such as chain-of-thought and self-consistency.<n>We propose reflective confidence, a novel reasoning framework that transforms low-confidence signals from termination indicators into reflection triggers.<n> Experiments on mathematical reasoning benchmarks, including AIME 2025, demonstrate significant accuracy improvements over advanced early-stopping baselines at comparable computational cost.
arXiv Detail & Related papers (2025-12-21T05:35:07Z) - Latent Chain-of-Thought for Visual Reasoning [53.541579327424046]
Chain-of-thought (CoT) reasoning is critical for improving the interpretability and reliability of Large Vision-Language Models (LVLMs)<n>We reformulate reasoning in LVLMs as posterior inference and propose a scalable training algorithm based on amortized variational inference.<n>We empirically demonstrate that the proposed method enhances the state-of-the-art LVLMs on seven reasoning benchmarks.
arXiv Detail & Related papers (2025-10-27T23:10:06Z) - Scalable, Explainable and Provably Robust Anomaly Detection with One-Step Flow Matching [14.503330877000758]
Time-Conditioned Contraction Matching is a novel method for semi-supervised anomaly detection in tabular data.<n>It is inspired by flow matching, a recent generative modeling framework that learns velocity fields between probability distributions.<n>Extensive experiments on the ADBench benchmark show that TCCM strikes a favorable balance between detection accuracy and inference cost.
arXiv Detail & Related papers (2025-10-21T06:26:38Z) - A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning [31.861874030715953]
We provide the first theoretical framework for analyzing sampling-based test-time scaling methods.<n>We introduce RPC, a hybrid method that leverages our theoretical insights through two key components: Perplexity Consistency and Reasoning Pruning.<n>RPC achieves reasoning performance comparable to self-consistency while not only enhancing confidence reliability but also reducing sampling costs by 50%.
arXiv Detail & Related papers (2025-10-17T08:59:30Z) - READER: Retrieval-Assisted Drafter for Efficient LLM Inference [0.0386965802948046]
Autoregressive Language Models instantiate a factorized likelihood over token sequences, yet their strictly sequential decoding process imposes an intrinsic lower bound on latency inference.<n>This bottleneck has emerged as a central obstacle to the scalable deployment of large-scale generative models.<n>We present READER, a speculative decoding framework that bypasses the training of the auxiliary draft model.
arXiv Detail & Related papers (2025-08-12T16:47:48Z) - ConCISE: Confidence-guided Compression in Step-by-step Efficient Reasoning [64.93140713419561]
Large Reasoning Models (LRMs) perform strongly in complex reasoning tasks via Chain-of-Thought (CoT) prompting, but often suffer from verbose outputs.<n>Existing fine-tuning-based compression methods either operate post-hoc pruning, risking disruption to reasoning coherence, or rely on sampling-based selection.<n>We introduce ConCISE, a framework designed to generate concise reasoning chains, integrating Confidence Injection to boost reasoning confidence, and Early Stopping to terminate reasoning when confidence is sufficient.
arXiv Detail & Related papers (2025-05-08T01:40:40Z) - Bridging Internal Probability and Self-Consistency for Effective and Efficient LLM Reasoning [53.25336975467293]
We present the first theoretical error decomposition analysis of methods such as perplexity and self-consistency.<n>Our analysis reveals a fundamental trade-off: perplexity methods suffer from substantial model error due to the absence of a proper consistency function.<n>We propose Reasoning-Pruning Perplexity Consistency (RPC), which integrates perplexity with self-consistency, and Reasoning Pruning, which eliminates low-probability reasoning paths.
arXiv Detail & Related papers (2025-02-01T18:09:49Z) - Ladder-of-Thought: Using Knowledge as Steps to Elevate Stance Detection [73.31406286956535]
We introduce the Ladder-of-Thought (LoT) for the stance detection task.
LoT directs the small LMs to assimilate high-quality external knowledge, refining the intermediate rationales produced.
Our empirical evaluations underscore LoT's efficacy, marking a 16% improvement over GPT-3.5 and a 10% enhancement compared to GPT-3.5 with CoT on stance detection task.
arXiv Detail & Related papers (2023-08-31T14:31:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.