Related papers: BudgetThinker: Empowering Budget-aware LLM Reasoning with Control Tokens

BudgetThinker: Empowering Budget-aware LLM Reasoning with Control Tokens

URL: http://arxiv.org/abs/2508.17196v2
Date: Fri, 29 Aug 2025 14:42:16 GMT
Title: BudgetThinker: Empowering Budget-aware LLM Reasoning with Control Tokens
Authors: Hao Wen, Xinrui Wu, Yi Sun, Feifei Zhang, Liye Chen, Jie Wang, Yunxin Liu, Yunhao Liu, Ya-Qin Zhang, Yuanchun Li,
Abstract summary: BudgetThinker is a framework designed to empower Large Language Models with budget-aware reasoning.<n>We show that BudgetThinker significantly surpasses strong baselines in maintaining performance across a variety of reasoning budgets.
Score: 33.607723102172194
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advancements in Large Language Models (LLMs) have leveraged increased test-time computation to enhance reasoning capabilities, a strategy that, while effective, incurs significant latency and resource costs, limiting their applicability in real-world time-constrained or cost-sensitive scenarios. This paper introduces BudgetThinker, a novel framework designed to empower LLMs with budget-aware reasoning, enabling precise control over the length of their thought processes. We propose a methodology that periodically inserts special control tokens during inference to continuously inform the model of its remaining token budget. This approach is coupled with a comprehensive two-stage training pipeline, beginning with Supervised Fine-Tuning (SFT) to familiarize the model with budget constraints, followed by a curriculum-based Reinforcement Learning (RL) phase that utilizes a length-aware reward function to optimize for both accuracy and budget adherence. We demonstrate that BudgetThinker significantly surpasses strong baselines in maintaining performance across a variety of reasoning budgets on challenging mathematical benchmarks. Our method provides a scalable and effective solution for developing efficient and controllable LLM reasoning, making advanced models more practical for deployment in resource-constrained and real-time environments.

Related papers

Budget-Aware Anytime Reasoning with LLM-Synthesized Preference Data [57.996437077411315]
We study the reasoning behavior of large language models (LLMs) under limited computation budgets.<n>We introduce an anytime reasoning framework and the Anytime Index, a metric that quantifies how effectively solution quality improves as reasoning tokens increase.<n> Experiments on NaturalPlan (Trip), AIME, and GPQA datasets show consistent gains across Grok-3, GPT-oss, GPT-4.1/4o, and LLaMA models.
arXiv Detail & Related papers (2026-01-16T07:09:30Z)
ROI-Reasoning: Rational Optimization for Inference via Pre-Computation Meta-Cognition [11.094392304740134]
We study budgeted inference-time reasoning for multiple tasks under a strict global token constraint.<n>This perspective highlights a meta-cognitive requirement -- anticipating task difficulty, estimating return over investment.<n>We propose ROI-Reasoning, a two-stage framework that endows LLMs with intrinsic, budget-aware rationality.
arXiv Detail & Related papers (2026-01-07T11:30:55Z)
Controlling Performance and Budget of a Centralized Multi-agent LLM System with Reinforcement Learning [53.57360296655208]
Large language models (LLMs) exhibit complementary strengths across domains and come with varying inference costs.<n>Existing approaches rely on decentralized frameworks, which invoke multiple LLMs for every input and thus lead to substantial and uncontrolled inference costs.<n>We introduce a centralized multi-LLM framework, where a controller LLM selectively coordinates a pool of expert models in a cost-efficient and cost-controllable manner.
arXiv Detail & Related papers (2025-11-04T17:35:17Z)
BARD: budget-aware reasoning distillation [25.725960386304646]
Long Chain-of-Thought (CoT) distillation effectively transfers reasoning capability to smaller language models.<n>We propose bftextBudget-Aware Reasoning Distillation (BARD), a novel framework that simultaneously distills reasoning capability and enables fine-grained control over the reasoning length.
arXiv Detail & Related papers (2025-11-03T11:30:18Z)
Revisiting LLM Reasoning via Information Bottleneck [57.519119962528166]
Large language models (LLMs) have recently demonstrated remarkable progress in reasoning capabilities through reinforcement learning with verifiable rewards (RLVR)<n>We present a theoretical characterization of LLM reasoning grounded in information bottleneck (IB) principle.<n>We propose IB-aware reasoning optimization (IBRO), a framework that encourages reasoning trajectories to be both informative about the final correct answer and generalizable.
arXiv Detail & Related papers (2025-07-24T13:14:25Z)
LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization [48.91511514636768]
Length-Adaptive Policy Optimization transforms reasoning length control from an external constraint into an intrinsic model capability.<n>LAPO enables models to internalize an understanding of appropriate reasoning depth through a two-stage reinforcement learning process.<n> Experiments on mathematical reasoning benchmarks demonstrate that LAPO reduces token usage by up to 40.9% while improving accuracy by 2.3%.
arXiv Detail & Related papers (2025-07-21T16:14:41Z)
Steering LLM Thinking with Budget Guidance [48.65894557568655]
Budget guidance is a method for steering the reasoning process of LLMs toward a target budget without requiring any fine-tuning.<n>Our approach introduces a lightweight predictor that models a Gamma distribution over the remaining thinking length.<n>This signal is then used to guide generation in a soft, token-level manner, ensuring that the overall reasoning trace adheres to the specified thinking budget.
arXiv Detail & Related papers (2025-06-16T17:57:05Z)
Optimizing Anytime Reasoning via Budget Relative Policy Optimization [38.57672572913099]
We present a novel framework, AnytimeReasoner, to optimize anytime reasoning performance.<n>We truncate the complete thinking process to fit within sampled token budgets from a prior distribution.<n>We then optimize the thinking and summary policies in a decoupled manner to maximize the cumulative reward.
arXiv Detail & Related papers (2025-05-19T17:58:44Z)
Scalable Chain of Thoughts via Elastic Reasoning [61.75753924952059]
Elastic Reasoning is a novel framework for scalable chain of thoughts.<n>It separates reasoning into two phases--thinking and solution--with independently allocated budgets.<n>Our approach produces more concise and efficient reasoning even in unconstrained settings.
arXiv Detail & Related papers (2025-05-08T15:01:06Z)
Teaching LLMs According to Their Aptitude: Adaptive Reasoning for Mathematical Problem Solving [55.895917967408586]
Existing approaches to mathematical reasoning with large language models rely on Chain-of-Thought (CoT) for generalizability or Tool-Integrated Reasoning (TIR) for precise computation.<n>We propose TATA (Teaching LLMs According to Their Aptitude), an adaptive framework that enables LLMs to personalize their reasoning strategy spontaneously.
arXiv Detail & Related papers (2025-02-17T16:56:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.