Related papers: Anytime Safe PAC Efficient Reasoning

Anytime Safe PAC Efficient Reasoning

URL: http://arxiv.org/abs/2601.22446v1
Date: Fri, 30 Jan 2026 01:30:17 GMT
Title: Anytime Safe PAC Efficient Reasoning
Authors: Chengyao Yu, Hao Zeng, Youxin Zhu, Jianguo Huang, Huajun Zeng, Bingyi Jing,
Abstract summary: Large Reasoning Models (LRMs) have demonstrated remarkable performance on complex tasks but suffer from high computational costs and latency.<n>We propose Betting Probably Approximately Correct (B-PAC) reasoning, a principled method that enables anytime safe and efficient online reasoning under partial feedback.
Score: 8.618430092165498
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Reasoning Models (LRMs) have demonstrated remarkable performance on complex tasks but suffer from high computational costs and latency. While selective thinking strategies improve efficiency by routing easy queries to non-thinking models, existing approaches often incur uncontrollable errors, especially in online settings where the performance loss of a non-thinking model is only partially observed and data are non-stationary. To address this, we propose Betting Probably Approximately Correct (B-PAC) reasoning, a principled method that enables anytime safe and efficient online reasoning under partial feedback. Specifically, we utilize inverse propensity scoring estimators to construct test supermartingales for candidate thresholds, and then dynamically adjust the routing threshold based on the accumulated statistical evidence of safety. Theoretically, we establish the anytime-valid performance loss control and the efficiency of B-PAC reasoning. Extensive experiments demonstrate that B-PAC reasoning significantly reduces computational overhead, decreasing thinking model usage by up to 81.01\%, while controlling the performance loss below the user-specified level.

Related papers

Observationally Informed Adaptive Causal Experimental Design [55.998153710215654]
We propose Active Residual Learning, a new paradigm that leverages the observational model as a foundational prior.<n>This approach shifts the experimental focus from learning target causal quantities from scratch to efficiently estimating the residuals required to correct observational bias.<n> Experiments on synthetic and semi-synthetic benchmarks demonstrate that R-Design significantly outperforms baselines.
arXiv Detail & Related papers (2026-03-04T06:52:37Z)
ODAR: Principled Adaptive Routing for LLM Reasoning via Active Inference [60.958331943869126]
ODAR-Expert is an adaptive routing framework that optimize the accuracy-efficiency trade-off via principled resource allocation.<n>We show strong and consistent gains, including 98.2% accuracy on MATH and 54.8% on Humanity's Last Exam.
arXiv Detail & Related papers (2026-02-27T05:22:01Z)
Conformal Thinking: Risk Control for Reasoning on a Compute Budget [60.65072883773352]
Reasoning Large Language Models (LLMs) enable test-time scaling, with dataset-level accuracy improving as the token budget increases.<n>We re-frame the budget setting problem as risk control, limiting the error rate while minimizing compute.<n>Our framework introduces an upper threshold that stops reasoning when the model is confident and a novel lower threshold that preemptively stops unsolvable instances.
arXiv Detail & Related papers (2026-02-03T18:17:22Z)
EntroCut: Entropy-Guided Adaptive Truncation for Efficient Chain-of-Thought Reasoning in Small-scale Large Reasoning Models [42.49934375597466]
Large Reasoning Models (LRMs) excel at complex reasoning tasks through extended chain-of-thought generation.<n>We find that the entropy of the model's output distribution in early reasoning steps reliably distinguishes correct from incorrect reasoning.<n>We propose EntroCut, a training-free method that dynamically truncates reasoning by identifying high-confidence states.
arXiv Detail & Related papers (2026-01-30T06:19:16Z)
PAC Reasoning: Controlling the Performance Loss for Efficient Reasoning [33.71268958080582]
Large reasoning models (LRMs) have achieved remarkable progress in complex problem-solving tasks.<n>LRMs typically suffer from high computational costs during deployment.<n>We propose Probably Approximately Correct (PAC) reasoning that controls the performance loss under the user-specified performance loss tolerance.
arXiv Detail & Related papers (2025-10-10T08:33:47Z)
Pruning Long Chain-of-Thought of Large Reasoning Models via Small-Scale Preference Optimization [26.462701299259248]
Large Reasoning Models (LRMs) have demonstrated strong performance on complex tasks through long Chain-of-Thought (CoT) reasoning.<n>Their lengthy outputs increase computational costs and may lead to overthinking, raising challenges in balancing reasoning effectiveness and efficiency.<n>This paper investigates efficient methods to reduce the generation length of LRMs.
arXiv Detail & Related papers (2025-08-13T20:00:09Z)
LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling [39.61818305829112]
We introduce PIR (Perplexity-based Importance Refinement), a principled framework that quantitatively evaluates the importance of each reasoning step.<n>PIR identifies and selectively prunes only low-importance functional steps while preserving progressive reasoning components.<n>Our approach demonstrates strong generalizability across different model sizes, data sources, and token budgets.
arXiv Detail & Related papers (2025-05-25T15:17:57Z)
SelfBudgeter: Adaptive Token Allocation for Efficient LLM Reasoning [43.91094438704087]
SelfBudgeter is an adaptive controllable reasoning framework that incorporates a budget estimation mechanism prior to reasoning.<n>We show that SelfBudgeter can dynamically allocate budgets according to problem complexity, yielding an average response length compression of 61%.
arXiv Detail & Related papers (2025-05-16T14:08:04Z)
The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models [69.798277882245]
We introduce Unsupervised Prefix Fine-Tuning (UPFT) to enhance large language models' reasoning efficiency.<n>UPFT removes the need for labeled data or exhaustive sampling.<n> Experiments show that UPFT matches the performance of supervised methods.
arXiv Detail & Related papers (2025-03-04T18:56:03Z)
Bridging Internal Probability and Self-Consistency for Effective and Efficient LLM Reasoning [53.25336975467293]
We present the first theoretical error decomposition analysis of methods such as perplexity and self-consistency.<n>Our analysis reveals a fundamental trade-off: perplexity methods suffer from substantial model error due to the absence of a proper consistency function.<n>We propose Reasoning-Pruning Perplexity Consistency (RPC), which integrates perplexity with self-consistency, and Reasoning Pruning, which eliminates low-probability reasoning paths.
arXiv Detail & Related papers (2025-02-01T18:09:49Z)
Performance Control in Early Exiting to Deploy Large Models at the Same Cost of Smaller Ones [17.797465636040087]
Early Exiting (EE) is a promising technique for speeding up inference by adaptively allocating compute resources to data points based on their difficulty.<n>We first present a novel perspective on the EE approach, showing that larger models deployed with EE can achieve higher performance than smaller models.<n>We introduce Performance Control Early Exiting (PCEE), a method that enables accuracy thresholding by basing decisions not on a data point's confidence but on the average accuracy of samples.
arXiv Detail & Related papers (2024-12-26T18:54:32Z)
Fine-Tuning Language Models with Advantage-Induced Policy Alignment [80.96507425217472]
We propose a novel algorithm for aligning large language models to human preferences. We show that it consistently outperforms PPO in language tasks by a large margin. We also provide a theoretical justification supporting the design of our loss function.
arXiv Detail & Related papers (2023-06-04T01:59:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.