Related papers: Strategic Scaling of Test-Time Compute: A Bandit Learning Approach

Strategic Scaling of Test-Time Compute: A Bandit Learning Approach

URL: http://arxiv.org/abs/2506.12721v1
Date: Sun, 15 Jun 2025 04:55:49 GMT
Title: Strategic Scaling of Test-Time Compute: A Bandit Learning Approach
Authors: Bowen Zuo, Yinglun Zhu,
Abstract summary: Scaling test-time compute has emerged as an effective strategy for improving the performance of large language models.<n>We propose adaptive algorithms that estimate query difficulty on the fly and allocate compute accordingly.<n>Our algorithms achieve up to an 11.10% performance improvement on the MATH-500 dataset and up to a 7.41% performance improvement on LiveCodeBench.
Score: 13.735277588793995
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Scaling test-time compute has emerged as an effective strategy for improving the performance of large language models. However, existing methods typically allocate compute uniformly across all queries, overlooking variation in query difficulty. To address this inefficiency, we formulate test-time compute allocation as a novel bandit learning problem and propose adaptive algorithms that estimate query difficulty on the fly and allocate compute accordingly. Compared to uniform allocation, our algorithms allocate more compute to challenging queries while maintaining accuracy on easier ones. Among challenging queries, our algorithms further learn to prioritize solvable instances, effectively reducing excessive computing on unsolvable queries. We theoretically prove that our algorithms achieve better compute efficiency than uniform allocation and empirically validate their effectiveness on math and code benchmarks. Specifically, our algorithms achieve up to an 11.10% performance improvement (15.04% relative) on the MATH-500 dataset and up to a 7.41% performance improvement (14.40% relative) on LiveCodeBench.

Related papers

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters [27.656263126925815]
We study the scaling of inference-time computation in LLMs. We find that in both cases, the effectiveness of different approaches to scaling test-time compute critically varies depending on the difficulty of the prompt.
arXiv Detail & Related papers (2024-08-06T17:35:05Z)
A General Online Algorithm for Optimizing Complex Performance Metrics [5.726378955570775]
We introduce and analyze a general online algorithm that can be used in a straightforward way with a variety of complex performance metrics in binary, multi-class, and multi-label classification problems. The algorithm's update and prediction rules are appealingly simple and computationally efficient without the need to store any past data.
arXiv Detail & Related papers (2024-06-20T21:24:47Z)
Large Language Model-Enhanced Algorithm Selection: Towards Comprehensive Algorithm Representation [27.378185644892984]
This paper introduces Large Language Models (LLMs) into algorithm selection for the first time. LLMs not only captures the structural and semantic aspects of the algorithm, but also demonstrates contextual awareness and library function understanding. The selected algorithm is determined by the matching degree between a given problem and different algorithms.
arXiv Detail & Related papers (2023-11-22T06:23:18Z)
Equitable and Fair Performance Evaluation of Whale Optimization Algorithm [4.0814527055582746]
It is essential that all algorithms are exhaustively, somewhat, and intelligently evaluated. evaluating the effectiveness of optimization algorithms equitably and fairly is not an easy process for various reasons.
arXiv Detail & Related papers (2023-09-04T06:32:02Z)
Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms. For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime. In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem. We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z)
A Boosting Approach to Reinforcement Learning [59.46285581748018]
We study efficient algorithms for reinforcement learning in decision processes whose complexity is independent of the number of states. We give an efficient algorithm that is capable of improving the accuracy of such weak learning methods.
arXiv Detail & Related papers (2021-08-22T16:00:45Z)
Double Coverage with Machine-Learned Advice [100.23487145400833]
We study the fundamental online $k$-server problem in a learning-augmented setting. We show that our algorithm achieves for any k an almost optimal consistency-robustness tradeoff.
arXiv Detail & Related papers (2021-03-02T11:04:33Z)
Benchmarking Simulation-Based Inference [5.3898004059026325]
Recent advances in probabilistic modelling have led to a large number of simulation-based inference algorithms which do not require numerical evaluation of likelihoods. We provide a benchmark with inference tasks and suitable performance metrics, with an initial selection of algorithms. We found that the choice of performance metric is critical, that even state-of-the-art algorithms have substantial room for improvement, and that sequential estimation improves sample efficiency.
arXiv Detail & Related papers (2021-01-12T18:31:22Z)
Evolving Reinforcement Learning Algorithms [186.62294652057062]
We propose a method for meta-learning reinforcement learning algorithms. The learned algorithms are domain-agnostic and can generalize to new environments not seen during training. We highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games.
arXiv Detail & Related papers (2021-01-08T18:55:07Z)
Towards Optimally Efficient Tree Search with Deep Learning [76.64632985696237]
This paper investigates the classical integer least-squares problem which estimates signals integer from linear models. The problem is NP-hard and often arises in diverse applications such as signal processing, bioinformatics, communications and machine learning. We propose a general hyper-accelerated tree search (HATS) algorithm by employing a deep neural network to estimate the optimal estimation for the underlying simplified memory-bounded A* algorithm.
arXiv Detail & Related papers (2021-01-07T08:00:02Z)
Learning to Actively Learn: A Robust Approach [14.46867518436922]
This work proposes a procedure for designing algorithms for adaptive data collection tasks like active learning and pure-exploration multi-armed bandits.<n>Our adaptive algorithm is learned via adversarial training over equivalence classes of problems derived from information theoretic lower bounds.<n>We perform synthetic experiments to justify the stability and effectiveness of the training procedure, and then evaluate the method on tasks derived from real data.
arXiv Detail & Related papers (2020-10-29T06:48:22Z)
Run2Survive: A Decision-theoretic Approach to Algorithm Selection based on Survival Analysis [75.64261155172856]
survival analysis (SA) naturally supports censored data and offers appropriate ways to use such data for learning distributional models of algorithm runtime. We leverage such models as a basis of a sophisticated decision-theoretic approach to algorithm selection, which we dub Run2Survive. In an extensive experimental study with the standard benchmark ASlib, our approach is shown to be highly competitive and in many cases even superior to state-of-the-art AS approaches.
arXiv Detail & Related papers (2020-07-06T15:20:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.