Related papers: Best-of-$\infty$ -- Asymptotic Performance of Test-Time Compute

Best-of-$\infty$ -- Asymptotic Performance of Test-Time Compute

URL: http://arxiv.org/abs/2509.21091v1
Date: Thu, 25 Sep 2025 12:41:05 GMT
Title: Best-of-$\infty$ -- Asymptotic Performance of Test-Time Compute
Authors: Junpei Komiyama, Daisuke Oba, Masafumi Oyamada,
Abstract summary: We study best-of-$N$ for large language models (LLMs) where the selection is based on majority voting.<n>We propose an adaptive generation scheme that selects $N$ based on answer agreement.<n>We extend the framework to weighted ensembles of multiple LLMs, showing that such mixtures can outperform any individual model.
Score: 10.167365483866663
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study best-of-$N$ for large language models (LLMs) where the selection is based on majority voting. In particular, we analyze the limit $N \to \infty$, which we denote as Best-of-$\infty$. While this approach achieves impressive performance in the limit, it requires an infinite test-time budget. To address this, we propose an adaptive generation scheme that selects $N$ based on answer agreement, thereby efficiently allocating inference-time computation. Beyond adaptivity, we extend the framework to weighted ensembles of multiple LLMs, showing that such mixtures can outperform any individual model. The optimal ensemble weighting is formulated and efficiently computed as a mixed-integer linear program. Extensive experiments demonstrate the effectiveness of our approach.

Related papers

FraPPE: Fast and Efficient Preference-based Pure Exploration [17.53646399595373]
We propose an efficient algorithm to optimally track the existing lower bound for arbitrary preference cones.<n>We prove that our proposed PrePEx algorithm, FraPPE, achieves the optimal sample complexity.
arXiv Detail & Related papers (2025-08-22T16:02:06Z)
Is Best-of-N the Best of Them? Coverage, Scaling, and Optimality in Inference-Time Alignment [54.787826863212146]
Inference-time computation offers a powerful axis for scaling the performance of language models.<n>We analyze the performance of inference-time alignment algorithms in terms of (i) response quality, and (ii) compute.<n>We introduce $textttInferenceTimePessimism$, a new algorithm which mitigates reward hacking through deliberate use of inference-time compute.
arXiv Detail & Related papers (2025-03-27T18:00:08Z)
AMPO: Active Multi-Preference Optimization for Self-play Preference Selection [16.230186347702737]
Multi-preference optimization enriches language-model alignment beyond pairwise preferences by contrasting entire sets of helpful and undesired responses.<n>We propose $textitActive Multi-Preference Optimization$ (AMPO), a novel approach that combines on-policy generation, a multi-preference group-contrastive loss, and active subset selection.<n>AMPO achieves state-of-the-art results on $textitAlpacaEval$ using Llama 8B and Mistral Mist 7B.
arXiv Detail & Related papers (2025-02-25T15:29:51Z)
Scaling LLM Inference with Optimized Sample Compute Allocation [56.524278187351925]
We propose OSCA, an algorithm to find an optimal mix of different inference configurations. Our experiments show that with our learned mixed allocation, we can achieve accuracy better than the best single configuration. OSCA is also shown to be effective in agentic beyond single-turn tasks, achieving a better accuracy on SWE-Bench with 3x less compute than the default configuration.
arXiv Detail & Related papers (2024-10-29T19:17:55Z)
$f$-PO: Generalizing Preference Optimization with $f$-divergence Minimization [54.94545757220999]
$f$-PO is a novel framework that generalizes and extends existing approaches.<n>We conduct experiments on state-of-the-art language models using benchmark datasets.
arXiv Detail & Related papers (2024-10-29T02:11:45Z)
Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed. SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z)
qPOTS: Efficient batch multiobjective Bayesian optimization via Pareto optimal Thompson sampling [0.0]
A sample-efficient approach to solving multiobjective optimization is via process oracle (GP) surrogates and MOBOOTS$.<n>We propose a Thompson sampling (TS) based approach ($qtextttPOTS$)<n>$qtextttPOTS$ solves a cheap multiobjective optimization on the GP posteriors with evolutionary approaches.
arXiv Detail & Related papers (2023-10-24T12:35:15Z)
A distribution-free mixed-integer optimization approach to hierarchical modelling of clustered and longitudinal data [0.0]
We introduce an innovative algorithm that evaluates cluster effects for new data points, thereby increasing the robustness and precision of this model. The inferential and predictive efficacy of this approach is further illustrated through its application in student scoring and protein expression.
arXiv Detail & Related papers (2023-02-06T23:34:51Z)
Minimax Optimization with Smooth Algorithmic Adversaries [59.47122537182611]
We propose a new algorithm for the min-player against smooth algorithms deployed by an adversary. Our algorithm is guaranteed to make monotonic progress having no limit cycles, and to find an appropriate number of gradient ascents.
arXiv Detail & Related papers (2021-06-02T22:03:36Z)
Online Model Selection for Reinforcement Learning with Function Approximation [50.008542459050155]
We present a meta-algorithm that adapts to the optimal complexity with $tildeO(L5/6 T2/3)$ regret. We also show that the meta-algorithm automatically admits significantly improved instance-dependent regret bounds.
arXiv Detail & Related papers (2020-11-19T10:00:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.