Related papers: Every Rollout Counts: Optimal Resource Allocation for Efficient Test-Time Scaling

Every Rollout Counts: Optimal Resource Allocation for Efficient Test-Time Scaling

URL: http://arxiv.org/abs/2506.15707v1
Date: Fri, 30 May 2025 09:05:25 GMT
Title: Every Rollout Counts: Optimal Resource Allocation for Efficient Test-Time Scaling
Authors: Xinglin Wang, Yiwei Li, Shaoxiong Feng, Peiwen Yuan, Yueqi Zhang, Jiayi Shi, Chuyi Tan, Boyuan Pan, Yao Hu, Kan Li,
Abstract summary: Test-Time Scaling (TTS) improves the performance of Large Language Models (LLMs)<n>How to allocate a fixed rollout budget most effectively during search remains underexplored, often resulting in inefficient use of compute at test time.<n>We propose Direction-Oriented Resource Allocation (DORA), a provably optimal method that mitigates this bias.
Score: 19.673388630963807
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Test-Time Scaling (TTS) improves the performance of Large Language Models (LLMs) by using additional inference-time computation to explore multiple reasoning paths through search. Yet how to allocate a fixed rollout budget most effectively during search remains underexplored, often resulting in inefficient use of compute at test time. To bridge this gap, we formulate test-time search as a resource allocation problem and derive the optimal allocation strategy that maximizes the probability of obtaining a correct solution under a fixed rollout budget. Within this formulation, we reveal a core limitation of existing search methods: solution-level allocation tends to favor reasoning directions with more candidates, leading to theoretically suboptimal and inefficient use of compute. To address this, we propose Direction-Oriented Resource Allocation (DORA), a provably optimal method that mitigates this bias by decoupling direction quality from candidate count and allocating resources at the direction level. To demonstrate DORA's effectiveness, we conduct extensive experiments on challenging mathematical reasoning benchmarks including MATH500, AIME2024, and AIME2025. The empirical results show that DORA consistently outperforms strong baselines with comparable computational cost, achieving state-of-the-art accuracy. We hope our findings contribute to a broader understanding of optimal TTS for LLMs.

Related papers

$\texttt{SPECS}$: Faster Test-Time Scaling through Speculative Drafts [55.231201692232894]
$textttSPECS$ is a latency-aware test-time scaling method inspired by speculative decoding.<n>Our results show that $textttSPECS$matches or surpasses beam search accuracy while reducing latency by up to $sim$19.1%.
arXiv Detail & Related papers (2025-06-15T05:50:05Z)
Stepwise Reasoning Checkpoint Analysis: A Test Time Scaling Method to Enhance LLMs' Reasoning [81.50681925980135]
We propose Stepwise Reasoning Checkpoint Analysis (SRCA), a framework that introduces checkpoints between reasoning steps.<n>It incorporates two key strategies: (1) Answer-Clustered Search, which groups reasoning paths by their intermediate checkpoint answers to maintain diversity while ensuring quality, and (2) Checkpoint Candidate Augmentation, which leverages all intermediate answers for final decision-making.<n>Our approach effectively reduces path homogenization and creates a fault-tolerant mechanism by utilizing high-quality intermediate results.
arXiv Detail & Related papers (2025-05-23T12:42:50Z)
A*-Decoding: Token-Efficient Inference Scaling [0.0]
Inference-time scaling has emerged as a powerful alternative to parameter scaling for improving language model performance.<n>We introduce A*-decoding, a search-based inference-time strategy that builds on the A* search algorithm to optimally utilize a fixed compute budget.<n>Our work demonstrates how thoughtful inference-time strategies can enhance reasoning in SLMs, pointing toward future advances in more efficient and scalable language model deployment.
arXiv Detail & Related papers (2025-05-19T19:19:48Z)
ABoN: Adaptive Best-of-N Alignment [19.22348775001393]
We propose a prompt-adaptive strategy for Best-of-N alignment that allocates inference-time compute more efficiently.<n>Our method is simple, practical, and compatible with any LM/RM combination.
arXiv Detail & Related papers (2025-05-17T15:24:48Z)
$φ$-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation [22.607133083903125]
In-time optimization scales computation to derive deliberate reasoning steps for effective performance.<n>We frame the decoding strategy as foresight sampling, leveraging simulated future steps to obtain globally optimal step estimation.<n>Experiments show $phi$-Decoding outperforms strong baselines in both performance and efficiency.
arXiv Detail & Related papers (2025-03-17T15:38:33Z)
ATA: Adaptive Task Allocation for Efficient Resource Management in Distributed Machine Learning [54.08906841213777]
Asynchronous methods are fundamental for parallelizing computations in distributed machine learning.<n>We propose ATA (Adaptive Task Allocation), a method that adapts to heterogeneous and random distributions of computation times.<n>We show that ATA identifies the optimal task allocation and performs comparably to methods with prior knowledge of computation times.
arXiv Detail & Related papers (2025-02-02T12:22:26Z)
Truncating Trajectories in Monte Carlo Policy Evaluation: an Adaptive Approach [51.76826149868971]
Policy evaluation via Monte Carlo simulation is at the core of many MC Reinforcement Learning (RL) algorithms. We propose as a quality index a surrogate of the mean squared error of a return estimator that uses trajectories of different lengths. We present an adaptive algorithm called Robust and Iterative Data collection strategy Optimization (RIDO)
arXiv Detail & Related papers (2024-10-17T11:47:56Z)
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters [27.656263126925815]
We study the scaling of inference-time computation in LLMs. We find that in both cases, the effectiveness of different approaches to scaling test-time compute critically varies depending on the difficulty of the prompt.
arXiv Detail & Related papers (2024-08-06T17:35:05Z)
On Speeding Up Language Model Evaluation [48.51924035873411]
We propose an $textitadaptive$ approach to explore this space.<n>We lean on multi-armed bandits to sequentially identify the next (method, validation sample)-pair to evaluate.<n>We show that it can identify the top-performing method using only 5-15% of the typical resources.
arXiv Detail & Related papers (2024-07-08T17:48:42Z)
DRAUC: An Instance-wise Distributionally Robust AUC Optimization Framework [133.26230331320963]
Area Under the ROC Curve (AUC) is a widely employed metric in long-tailed classification scenarios. We propose an instance-wise surrogate loss of Distributionally Robust AUC (DRAUC) and build our optimization framework on top of it.
arXiv Detail & Related papers (2023-11-06T12:15:57Z)
BORA: Bayesian Optimization for Resource Allocation [0.19116784879310028]
We propose an extension of the optimal resource allocation to a more general class of problems, specifically with resources availability changing over time. Three algorithms for Bayesian Optimization for Resource Allocation are presented, working on allocation decisions represented as numerical vectors or distributions. Results on (i) the original SBF case study proposed in the literature, and (ii) a real-life application (i.e., the optimization of multi-channel marketing) empirically prove that BORA is a more efficient and effective learning-and-optimization framework than SBF.
arXiv Detail & Related papers (2022-10-12T07:39:05Z)
Outlier-Robust Sparse Estimation via Non-Convex Optimization [73.18654719887205]
We explore the connection between high-dimensional statistics and non-robust optimization in the presence of sparsity constraints. We develop novel and simple optimization formulations for these problems. As a corollary, we obtain that any first-order method that efficiently converges to station yields an efficient algorithm for these tasks.
arXiv Detail & Related papers (2021-09-23T17:38:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.