e1: Learning Adaptive Control of Reasoning Effort
- URL: http://arxiv.org/abs/2510.27042v1
- Date: Thu, 30 Oct 2025 23:12:21 GMT
- Title: e1: Learning Adaptive Control of Reasoning Effort
- Authors: Michael Kleinman, Matthew Trager, Alessandro Achille, Wei Xia, Stefano Soatto,
- Abstract summary: Increasing the thinking budget of AI models can significantly improve accuracy, but not all questions warrant the same amount of reasoning.<n>Users may prefer to allocate different amounts of reasoning effort depending on how they value output quality versus latency and cost.<n>We propose Adaptive Effort Control, a self-adaptive reinforcement learning method that trains models to use a user-specified fraction of tokens.
- Score: 88.51897900019485
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Increasing the thinking budget of AI models can significantly improve accuracy, but not all questions warrant the same amount of reasoning. Users may prefer to allocate different amounts of reasoning effort depending on how they value output quality versus latency and cost. To leverage this tradeoff effectively, users need fine-grained control over the amount of thinking used for a particular query, but few approaches enable such control. Existing methods require users to specify the absolute number of desired tokens, but this requires knowing the difficulty of the problem beforehand to appropriately set the token budget for a query. To address these issues, we propose Adaptive Effort Control, a self-adaptive reinforcement learning method that trains models to use a user-specified fraction of tokens relative to the current average chain-of-thought length for each query. This approach eliminates dataset- and phase-specific tuning while producing better cost-accuracy tradeoff curves compared to standard methods. Users can dynamically adjust the cost-accuracy trade-off through a continuous effort parameter specified at inference time. We observe that the model automatically learns to allocate resources proportionally to the task difficulty and, across model scales ranging from 1.5B to 32B parameters, our approach enables approximately 3x reduction in chain-of-thought length while maintaining or improving performance relative to the base model used for RL training.
Related papers
- ODAR: Principled Adaptive Routing for LLM Reasoning via Active Inference [60.958331943869126]
ODAR-Expert is an adaptive routing framework that optimize the accuracy-efficiency trade-off via principled resource allocation.<n>We show strong and consistent gains, including 98.2% accuracy on MATH and 54.8% on Humanity's Last Exam.
arXiv Detail & Related papers (2026-02-27T05:22:01Z) - Conformal Thinking: Risk Control for Reasoning on a Compute Budget [60.65072883773352]
Reasoning Large Language Models (LLMs) enable test-time scaling, with dataset-level accuracy improving as the token budget increases.<n>We re-frame the budget setting problem as risk control, limiting the error rate while minimizing compute.<n>Our framework introduces an upper threshold that stops reasoning when the model is confident and a novel lower threshold that preemptively stops unsolvable instances.
arXiv Detail & Related papers (2026-02-03T18:17:22Z) - Models Under SCOPE: Scalable and Controllable Routing via Pre-hoc Reasoning [28.165465162107253]
We propose SCOPE, a routing framework that goes beyond model selection by predicting their cost and performance.<n>SCOPE makes reasoning-based predictions by retrieving how models behave on similar problems, rather than relying on fixed model names.<n>It can boost accuracy by up to 25.7% when performance is the priority, or cut costs by up to 95.1% when efficiency matters most.
arXiv Detail & Related papers (2026-01-29T21:09:36Z) - Think Right: Learning to Mitigate Under-Over Thinking via Adaptive, Attentive Compression [68.69801176669843]
We propose an online post-training RL method that prunes redundant steps and estimates difficulty.<n> TRAAC (Think Right with Adaptive, Attentive Compression) achieves an average absolute accuracy gain of 8.4%.<n>Although our models are trained on math datasets, they show accuracy and efficiency gains on out-of-distribution non-math datasets.
arXiv Detail & Related papers (2025-10-02T02:00:20Z) - R-Stitch: Dynamic Trajectory Stitching for Efficient Reasoning [80.104336426172]
Chain-of-thought (CoT) enhances problem-solving ability of large language models.<n>CoT incurs substantial inference cost due to long autoregressive trajectories.<n>We introduce R-Stitch, a training-free hybrid decoding framework.
arXiv Detail & Related papers (2025-07-23T08:14:36Z) - Self-Route: Automatic Mode Switching via Capability Estimation for Efficient Reasoning [36.470695895695044]
Self-Route is a dynamic reasoning framework that automatically selects between general and reasoning modes.<n>We show that Self-Route achieves comparable accuracy to reasoning models while reducing token consumption by 30-55%.
arXiv Detail & Related papers (2025-05-27T03:18:31Z) - ARM: Adaptive Reasoning Model [52.955386431182724]
We propose Adaptive Reasoning Model (ARM), a reasoning model capable of adaptively selecting appropriate formats based on the task at hand.<n>Ada-GRPO enables ARM to achieve high token efficiency, reducing tokens by an average of 30%, and up to 70%, while maintaining performance comparable to the model that relies solely on Long CoT.
arXiv Detail & Related papers (2025-05-26T17:38:50Z) - AdaCtrl: Towards Adaptive and Controllable Reasoning via Difficulty-Aware Budgeting [23.004467211806467]
AdaCtrl is a novel framework to support difficulty-aware adaptive reasoning budget allocation.<n>It dynamically adjusts its reasoning length based on self-assessed problem difficulty.<n>AdaCtrl enables precise user control over the reasoning budget, allowing for tailored responses to meet specific needs.
arXiv Detail & Related papers (2025-05-24T18:46:50Z) - SelfBudgeter: Adaptive Token Allocation for Efficient LLM Reasoning [43.91094438704087]
SelfBudgeter is an adaptive controllable reasoning framework that incorporates a budget estimation mechanism prior to reasoning.<n>We show that SelfBudgeter can dynamically allocate budgets according to problem complexity, yielding an average response length compression of 61%.
arXiv Detail & Related papers (2025-05-16T14:08:04Z) - Fast or Better? Balancing Accuracy and Cost in Retrieval-Augmented Generation with Flexible User Control [52.405085773954596]
Retrieval-Augmented Generation has emerged as a powerful approach to mitigate large language model hallucinations.<n>Existing RAG frameworks often apply retrieval indiscriminately,leading to inefficiencies-over-retrieving.<n>We introduce a novel user-controllable RAG framework that enables dynamic adjustment of the accuracy-cost trade-off.
arXiv Detail & Related papers (2025-02-17T18:56:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.