Related papers: Test-Time Compute Games

Test-Time Compute Games

URL: http://arxiv.org/abs/2601.21839v1
Date: Thu, 29 Jan 2026 15:18:01 GMT
Title: Test-Time Compute Games
Authors: Ander Artola Velasco, Dimitrios Rontogiannis, Stratis Tsirtsis, Manuel Gomez-Rodriguez,
Abstract summary: Test-time compute has emerged as a promising strategy to enhance the reasoning abilities of large language models.<n>We show that the market of LLM-as-a-service is socially inefficient, since providers have a financial incentive to increase the amount of test-time compute.<n>We introduce a reverse second-price auction mechanism where providers bid their offered price and (expected) quality for the opportunity to serve a user.
Score: 11.108199754300772
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Test-time compute has emerged as a promising strategy to enhance the reasoning abilities of large language models (LLMs). However, this strategy has in turn increased how much users pay cloud-based providers offering LLM-as-a-service, since providers charge users for the amount of test-time compute they use to generate an output. In our work, we show that the market of LLM-as-a-service is socially inefficient: providers have a financial incentive to increase the amount of test-time compute, even if this increase contributes little to the quality of the outputs. To address this inefficiency, we introduce a reverse second-price auction mechanism where providers bid their offered price and (expected) quality for the opportunity to serve a user, and users pay proportionally to the marginal value generated by the winning provider relative to the second-highest bidder. To illustrate and complement our theoretical results, we conduct experiments with multiple instruct models from the $\texttt{Llama}$ and $\texttt{Qwen}$ families, as well as reasoning models distilled from $\texttt{DeepSeek-R1}$, on math and science benchmark datasets.

Related papers

Reliable LLM-Based Edge-Cloud-Expert Cascades for Telecom Knowledge Systems [54.916243942641444]
Large language models (LLMs) are emerging as key enablers of automation in domains such as telecommunications.<n>We study an edge-cloud-expert cascaded LLM-based knowledge system that supports decision-making through a question-and-answer pipeline.
arXiv Detail & Related papers (2025-12-23T03:10:09Z)
Learning Personalized Ad Impact via Contextual Reinforcement Learning under Delayed Rewards [36.029144318322686]
We model ad bidding as a Contextual Markov Decision Process (CMDP) with delayed Poisson rewards.<n>For efficient estimation, we propose a two-stage maximum likelihood estimator combined with data-splitting strategies.<n>We design a reinforcement learning algorithm to derive efficient personalized bidding strategies.
arXiv Detail & Related papers (2025-10-22T22:08:36Z)
EconProver: Towards More Economical Test-Time Scaling for Automated Theorem Proving [64.15371139980802]
Large Language Models (LLMs) have recently advanced the field of Automated Theorem Proving (ATP)<n>We show that different test-time scaling strategies for ATP models introduce significant computational overhead for inference.<n>We propose two complementary methods that can be integrated into a unified EconRL pipeline for amplified benefits.
arXiv Detail & Related papers (2025-09-16T03:00:13Z)
Learning from Synthetic Labs: Language Models as Auction Participants [12.007281866970485]
This paper introduces a novel synthetic data-generating process to help facilitate the study and design of auctions.<n>We find that simulated AI agents (large language models) agree with the experimental literature in auctions across a variety of classic formats.
arXiv Detail & Related papers (2025-07-12T00:00:30Z)
Is Your LLM Overcharging You? Tokenization, Transparency, and Incentives [13.91198481393699]
We develop an efficient algorithm that allows providers to significantly overcharge users without raising suspicion.<n>We show that to eliminate the financial incentive to strategize, a pricing mechanism must price tokens linearly on their character count.
arXiv Detail & Related papers (2025-05-27T18:02:12Z)
Supervised Optimism Correction: Be Confident When LLMs Are Sure [91.7459076316849]
We establish a novel theoretical connection between supervised fine-tuning and offline reinforcement learning.<n>We show that the widely used beam search method suffers from unacceptable over-optimism.<n>We propose Supervised Optimism Correction, which introduces a simple yet effective auxiliary loss for token-level $Q$-value estimations.
arXiv Detail & Related papers (2025-04-10T07:50:03Z)
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning [60.67176246634741]
We formalize the problem of optimizing test-time compute as a meta-reinforcement learning (RL) problem.<n>We show that state-of-the-art models do not minimize regret, but one can do so by maximizing a dense reward bonus in conjunction with the outcome 0/1 reward RL.
arXiv Detail & Related papers (2025-03-10T17:40:43Z)
Scalable Best-of-N Selection for Large Language Models via Self-Certainty [75.1351701045874]
Best-of-N selection is a key technique for improving the reasoning performance of Large Language Models (LLMs)<n>We propose self-certainty, a novel and efficient metric that leverages the inherent probability distribution of LLM outputs to estimate response quality without requiring external reward models.<n>Our findings establish self-certainty as a practical and efficient way for improving LLM reasoning capabilities.
arXiv Detail & Related papers (2025-02-25T19:08:07Z)
Fairshare Data Pricing via Data Valuation for Large Language Models [22.96743502195587]
This paper introduces a theoretical framework for large language models (LLMs) data markets.<n>We show how exploitative pricing drives high-quality sellers out of the market.<n>We then introduce fairshare, a pricing mechanism grounded in data valuation.
arXiv Detail & Related papers (2025-01-31T22:27:34Z)
Self-Refinement Strategies for LLM-based Product Attribute Value Extraction [51.45146101802871]
This paper investigates applying two self-refinement techniques to the product attribute value extraction task.<n>The experiments show that both self-refinement techniques fail to significantly improve the extraction performance while substantially increasing processing costs.<n>For scenarios with development data, fine-tuning yields the highest performance, while the ramp-up costs of fine-tuning are balanced out as the amount of product descriptions increases.
arXiv Detail & Related papers (2025-01-02T12:55:27Z)
SMART: Automatically Scaling Down Language Models with Accuracy Guarantees for Reduced Processing Fees [21.801053526411415]
Large Language Models (LLMs) have significantly boosted performance in natural language processing (NLP) tasks. The deployment of high-performance LLMs incurs substantial costs, primarily due to the increased number of parameters aimed at enhancing model performance. We introduce SMART, a novel framework designed to minimize the inference costs of NLP tasks while ensuring sufficient result quality.
arXiv Detail & Related papers (2024-03-11T17:45:47Z)
Enhancing User' s Income Estimation with Super-App Alternative Data [59.60094442546867]
It compares the performance of these alternative data sources with the performance of industry-accepted bureau income estimators. Ultimately, this paper shows the incentive for financial institutions to seek to incorporate alternative data into constructing their risk profiles.
arXiv Detail & Related papers (2021-04-12T21:34:44Z)
A Game-Theoretic Analysis of the Empirical Revenue Maximization Algorithm with Endogenous Sampling [19.453243313852557]
Empirical Revenue Maximization (ERM) is one of the most important price learning algorithms in auction design. We generalize the definition of an incentive-awareness measure proposed by Lavi et al to quantify the reduction of ERM's outputted price due to a change of $mge 1$ out of $N$ input samples. We construct an efficient, approximately incentive-compatible, and revenue-optimal learning algorithm using ERM in repeated auctions against non-myopic bidders, and show approximate group incentive-compatibility in uniform-price auctions.
arXiv Detail & Related papers (2020-10-12T08:20:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.