Related papers: Reliability-Aware Adaptive Self-Consistency for Efficient Sampling in LLM Reasoning

Reliability-Aware Adaptive Self-Consistency for Efficient Sampling in LLM Reasoning

URL: http://arxiv.org/abs/2601.02970v1
Date: Tue, 06 Jan 2026 12:27:53 GMT
Title: Reliability-Aware Adaptive Self-Consistency for Efficient Sampling in LLM Reasoning
Authors: Junseok Kim, Nakyeong Yang, Kyungmin Min, Kyomin Jung,
Abstract summary: Self-Consistency improves reasoning reliability through multi-sample aggregation, but incurs substantial inference cost.<n>We propose Reliability-Aware Adaptive Self-Consistency (ReASC), which addresses this limitation by reframing adaptive sampling from response counting to evidence sufficiency.<n>ReASC consistently achieves the best accuracy-cost trade-off compared to existing baselines, yielding improved inference efficiency across model scales from 3B to 27B parameters.
Score: 20.371912257758634
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Self-Consistency improves reasoning reliability through multi-sample aggregation, but incurs substantial inference cost. Adaptive self-consistency methods mitigate this issue by adjusting the sampling budget; however, they rely on count-based stopping rules that treat all responses equally, often leading to unnecessary sampling. We propose Reliability-Aware Adaptive Self-Consistency (ReASC), which addresses this limitation by reframing adaptive sampling from response counting to evidence sufficiency, leveraging response-level confidence for principled information aggregation. ReASC operates in two stages: a single-sample decision stage that resolves instances confidently answerable from a single response, and a reliability-aware accumulation stage that aggregates responses by jointly leveraging their frequency and confidence. Across five models and four datasets, ReASC consistently achieves the best accuracy-cost trade-off compared to existing baselines, yielding improved inference efficiency across model scales from 3B to 27B parameters. As a concrete example, ReASC reduces inference cost by up to 70\% relative to self-consistency while preserving accuracy on GSM8K using Gemma-3-4B-it.

Related papers

On Calibration of Large Language Models: From Response To Capability [66.59139960234326]
Large language models (LLMs) are widely deployed as general-purpose problem solvers.<n>We introduce capability calibration, which targets the model's expected accuracy on a query.<n>Our results demonstrate that capability-calibrated confidence improves pass@$k$ prediction and inference budget allocation.
arXiv Detail & Related papers (2026-02-14T01:07:45Z)
CoRefine: Confidence-Guided Self-Refinement for Adaptive Test-Time Compute [10.548368675645403]
CoRefine is a confidence-guided self-refinement method that achieves competitive accuracy using a fraction of the tokens.<n>The controller consumes full-trace confidence to decide whether to halt, re-examine, or try a different approach.<n>We extend this to CoRefine-Tree, a hybrid sequential-parallel variant that adaptively balances exploration and exploitation.
arXiv Detail & Related papers (2026-02-09T17:44:41Z)
Reliable LLM-Based Edge-Cloud-Expert Cascades for Telecom Knowledge Systems [54.916243942641444]
Large language models (LLMs) are emerging as key enablers of automation in domains such as telecommunications.<n>We study an edge-cloud-expert cascaded LLM-based knowledge system that supports decision-making through a question-and-answer pipeline.
arXiv Detail & Related papers (2025-12-23T03:10:09Z)
Enhancing the Outcome Reward-based RL Training of MLLMs with Self-Consistency Sampling [90.87033586963828]
Outcome-reward reinforcement learning (RL) is a common and increasingly significant way to refine the step-by-step reasoning of multimodal large language models (MLLMs)<n>We propose Self-Consistency Sampling (SCS) to correct this issue.<n>Based on Qwen2.5-VL-7B-Instruct, SCS improves accuracy by up to 7.7 percentage points on six multimodal benchmarks with negligible extra computation.
arXiv Detail & Related papers (2025-11-13T18:59:57Z)
ConCISE: Confidence-guided Compression in Step-by-step Efficient Reasoning [64.93140713419561]
Large Reasoning Models (LRMs) perform strongly in complex reasoning tasks via Chain-of-Thought (CoT) prompting, but often suffer from verbose outputs.<n>Existing fine-tuning-based compression methods either operate post-hoc pruning, risking disruption to reasoning coherence, or rely on sampling-based selection.<n>We introduce ConCISE, a framework designed to generate concise reasoning chains, integrating Confidence Injection to boost reasoning confidence, and Early Stopping to terminate reasoning when confidence is sufficient.
arXiv Detail & Related papers (2025-05-08T01:40:40Z)
Efficient Test-Time Scaling via Self-Calibration [18.32718448734639]
Best-of-N sampling and Self-Consistency with majority voting are simple and effective, but require a fixed number of sampling responses for each query.<n>This could result in wasted computation for simpler questions and insufficient exploration for more challenging ones.<n>We argue that model confidence of responses can be used for improving the efficiency of test-time scaling.
arXiv Detail & Related papers (2025-02-25T00:21:14Z)
Confidence Improves Self-Consistency in LLMs [17.280967928501678]
We introduce Confidence-Informed Self-Consistency (CISC)<n>CISC performs a weighted majority vote based on confidence scores obtained directly from the model.<n>When tested on nine models and four datasets, CISC outperforms self-consistency in nearly all configurations.
arXiv Detail & Related papers (2025-02-10T08:10:29Z)
Fact-Level Confidence Calibration and Self-Correction [64.40105513819272]
We propose a Fact-Level framework that calibrates confidence to relevance-weighted correctness at the fact level. We also develop Confidence-Guided Fact-level Self-Correction ($textbfConFix$), which uses high-confidence facts within a response as additional knowledge to improve low-confidence ones.
arXiv Detail & Related papers (2024-11-20T14:15:18Z)
Reasoning Aware Self-Consistency: Leveraging Reasoning Paths for Efficient LLM Sampling [9.44858963874474]
Self-Consistency mitigates hallucinations in Large Language Models (LLMs) by sampling multiple reasoning paths.<n>We introduce Reasoning-Aware Self-Consistency (RASC), a novel framework that enhances sampling efficiency and reasoning faithfulness.
arXiv Detail & Related papers (2024-08-30T05:14:59Z)
Make Every Penny Count: Difficulty-Adaptive Self-Consistency for Cost-Efficient Reasoning [19.408941114068444]
Self-consistency (SC) is a widely used decoding strategy for chain-of-thought reasoning.<n>Its variants, Adaptive self-consistency (ASC) and Early-stopping self-consistency (ESC), dynamically adjust the number of samples based on the posterior distribution of a set of pre-samples.<n>We propose Difficulty-Adaptive Self-Consistency (DSC), which leverages the difficulty information of batch queries to adaptively allocate inference resources.
arXiv Detail & Related papers (2024-08-24T04:03:35Z)
Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice. We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z)
Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLMs [60.58434523646137]
A popular approach for improving the correctness of output from large language models (LLMs) is Self-Consistency. We introduce Adaptive-Consistency, a cost-efficient, model-agnostic technique that dynamically adjusts the number of samples per question. Our experiments show that Adaptive-Consistency reduces sample budget by up to 7.9 times with an average accuracy drop of less than 0.1%.
arXiv Detail & Related papers (2023-05-19T17:49:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.