Confidence-Weighted Token Set Cover for Early Hypothesis Pruning in Self-Consistency
- URL: http://arxiv.org/abs/2508.03979v1
- Date: Wed, 06 Aug 2025 00:14:18 GMT
- Title: Confidence-Weighted Token Set Cover for Early Hypothesis Pruning in Self-Consistency
- Authors: Md Arafat Sultan, Ramón Fernandez Astudillo,
- Abstract summary: We investigate if self-consistency can be made more token-efficient for long chain-of-thought reasoning tasks.<n>We generate all solutions in parallel, but periodically prune intermediate hypotheses.<n>Our evaluation of five LLMs on three math benchmarks shows that this method can improve token efficiency for all models, by 10-35% in many cases.
- Score: 15.463112285139157
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite its simplicity and efficacy, the high token expenditure of self-consistency can limit its practical utility. Here we investigate if self-consistency can be made more token-efficient for long chain-of-thought reasoning tasks, while preserving its parallelism, through early hypothesis pruning. Concretely, we generate all solutions in parallel, but periodically prune intermediate hypotheses that are deemed unnecessary based on two lightweight indicators: (a) the model's own confidence in individual hypotheses, and (b) lexical coverage of all current hypotheses by candidate subsets that are under consideration for continued retention. We design a fast weighted set cover algorithm that utilizes the two indicators; our evaluation of five LLMs on three math benchmarks shows that this method can improve token efficiency for all models, by 10-35% in many cases.
Related papers
- The Consistency Hypothesis in Uncertainty Quantification for Large Language Models [22.60039074743706]
Black-box uncertainty quantification (UQ) methods, relying solely on model API access, have gained popularity due to their practical benefits.<n>In this paper, we examine the implicit assumption behind several UQ methods, which use generation consistency as a proxy for confidence.<n>We propose data-free black-box UQ methods that aggregate similarities between generations for confidence estimation.
arXiv Detail & Related papers (2025-06-27T01:53:15Z) - Fractured Chain-of-Thought Reasoning [61.647243580650446]
We introduce Fractured Sampling, a unified inference-time strategy that interpolates between full CoT and solution-only sampling.<n>We show that Fractured Sampling consistently achieves superior accuracy-cost trade-offs, yielding steep log-linear scaling gains in Pass@k versus token budget.
arXiv Detail & Related papers (2025-05-19T11:30:41Z) - ConCISE: Confidence-guided Compression in Step-by-step Efficient Reasoning [75.1101108949743]
Large Reasoning Models (LRMs) perform strongly in complex reasoning tasks via Chain-of-Thought (CoT) prompting.<n>LRMs often suffer from verbose outputs caused by redundant content, increasing computational overhead, and degrading user experience.<n>We propose ConCISE, a framework that simplifies reasoning chains by reinforcing the model's confidence during inference.
arXiv Detail & Related papers (2025-05-08T01:40:40Z) - Saliency-driven Dynamic Token Pruning for Large Language Models [32.903622070917194]
Saliency-driven Dynamic Token Pruning (SDTP)<n>A lightweight saliency-driven prediction module is designed to estimate the importance score of each token with its hidden state.<n>A ranking-based optimization strategy is proposed to minimize the ranking divergence of the saliency score and the predicted importance score.
arXiv Detail & Related papers (2025-04-06T15:15:07Z) - Scalable Best-of-N Selection for Large Language Models via Self-Certainty [65.31658824274894]
Best-of-N selection is a key technique for improving the reasoning performance of Large Language Models.<n>We propose self-certainty, a novel and efficient metric to estimate response quality without requiring external reward models.<n>Our findings establish self-certainty as a practical and efficient way for improving LLM reasoning capabilities.
arXiv Detail & Related papers (2025-02-25T19:08:07Z) - Confidence Improves Self-Consistency in LLMs [9.764747744761085]
We introduce Confidence-Informed Self-Consistency (CISC)<n>CISC performs a weighted majority vote based on confidence scores obtained directly from the model.<n>When tested on nine models and four datasets, CISC outperforms self-consistency in nearly all configurations.
arXiv Detail & Related papers (2025-02-10T08:10:29Z) - Mirror-Consistency: Harnessing Inconsistency in Majority Voting [54.30719306011487]
We present Mirror-Consistency, an enhancement of the standard Self-Consistency approach.
Mirror-Consistency incorporates a'reflective mirror' into the self-ensemble decoding process.
We show that Mirror-Consistency yields superior performance in both reasoning accuracy and confidence calibration compared to Self-Consistency.
arXiv Detail & Related papers (2024-10-07T03:41:08Z) - Path-Consistency: Prefix Enhancement for Efficient Inference in LLM [3.309813585671485]
textitpath-consistency mitigates both the errors and redundancies from random or less useful sampling in self-consistency.<n>textitpath-consistency achieves significant acceleration in inference latency ranging from $7.8%$ to $40.5%$.
arXiv Detail & Related papers (2024-08-25T01:45:53Z) - Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode.
We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z) - On Efficient and Robust Metrics for RANSAC Hypotheses and 3D Rigid
Registration [51.64236850960365]
This paper focuses on developing efficient and robust evaluation metrics for RANSAC hypotheses to achieve accurate 3D rigid registration.
We analyze the contributions of inliers and outliers, and then proposing several efficient and robust metrics with different designing motivations for RANSAC hypotheses.
arXiv Detail & Related papers (2020-11-10T02:22:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.