Test-Time Compute Games
- URL: http://arxiv.org/abs/2601.21839v1
- Date: Thu, 29 Jan 2026 15:18:01 GMT
- Title: Test-Time Compute Games
- Authors: Ander Artola Velasco, Dimitrios Rontogiannis, Stratis Tsirtsis, Manuel Gomez-Rodriguez,
- Abstract summary: Test-time compute has emerged as a promising strategy to enhance the reasoning abilities of large language models.<n>We show that the market of LLM-as-a-service is socially inefficient, since providers have a financial incentive to increase the amount of test-time compute.<n>We introduce a reverse second-price auction mechanism where providers bid their offered price and (expected) quality for the opportunity to serve a user.
- Score: 11.108199754300772
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Test-time compute has emerged as a promising strategy to enhance the reasoning abilities of large language models (LLMs). However, this strategy has in turn increased how much users pay cloud-based providers offering LLM-as-a-service, since providers charge users for the amount of test-time compute they use to generate an output. In our work, we show that the market of LLM-as-a-service is socially inefficient: providers have a financial incentive to increase the amount of test-time compute, even if this increase contributes little to the quality of the outputs. To address this inefficiency, we introduce a reverse second-price auction mechanism where providers bid their offered price and (expected) quality for the opportunity to serve a user, and users pay proportionally to the marginal value generated by the winning provider relative to the second-highest bidder. To illustrate and complement our theoretical results, we conduct experiments with multiple instruct models from the $\texttt{Llama}$ and $\texttt{Qwen}$ families, as well as reasoning models distilled from $\texttt{DeepSeek-R1}$, on math and science benchmark datasets.
Related papers
- Reliable LLM-Based Edge-Cloud-Expert Cascades for Telecom Knowledge Systems [54.916243942641444]
Large language models (LLMs) are emerging as key enablers of automation in domains such as telecommunications.<n>We study an edge-cloud-expert cascaded LLM-based knowledge system that supports decision-making through a question-and-answer pipeline.
arXiv Detail & Related papers (2025-12-23T03:10:09Z) - Learning Personalized Ad Impact via Contextual Reinforcement Learning under Delayed Rewards [36.029144318322686]
We model ad bidding as a Contextual Markov Decision Process (CMDP) with delayed Poisson rewards.<n>For efficient estimation, we propose a two-stage maximum likelihood estimator combined with data-splitting strategies.<n>We design a reinforcement learning algorithm to derive efficient personalized bidding strategies.
arXiv Detail & Related papers (2025-10-22T22:08:36Z) - EconProver: Towards More Economical Test-Time Scaling for Automated Theorem Proving [64.15371139980802]
Large Language Models (LLMs) have recently advanced the field of Automated Theorem Proving (ATP)<n>We show that different test-time scaling strategies for ATP models introduce significant computational overhead for inference.<n>We propose two complementary methods that can be integrated into a unified EconRL pipeline for amplified benefits.
arXiv Detail & Related papers (2025-09-16T03:00:13Z) - Learning from Synthetic Labs: Language Models as Auction Participants [12.007281866970485]
This paper introduces a novel synthetic data-generating process to help facilitate the study and design of auctions.<n>We find that simulated AI agents (large language models) agree with the experimental literature in auctions across a variety of classic formats.
arXiv Detail & Related papers (2025-07-12T00:00:30Z) - Is Your LLM Overcharging You? Tokenization, Transparency, and Incentives [13.91198481393699]
We develop an efficient algorithm that allows providers to significantly overcharge users without raising suspicion.<n>We show that to eliminate the financial incentive to strategize, a pricing mechanism must price tokens linearly on their character count.
arXiv Detail & Related papers (2025-05-27T18:02:12Z) - Supervised Optimism Correction: Be Confident When LLMs Are Sure [91.7459076316849]
We establish a novel theoretical connection between supervised fine-tuning and offline reinforcement learning.<n>We show that the widely used beam search method suffers from unacceptable over-optimism.<n>We propose Supervised Optimism Correction, which introduces a simple yet effective auxiliary loss for token-level $Q$-value estimations.
arXiv Detail & Related papers (2025-04-10T07:50:03Z) - Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning [60.67176246634741]
We formalize the problem of optimizing test-time compute as a meta-reinforcement learning (RL) problem.<n>We show that state-of-the-art models do not minimize regret, but one can do so by maximizing a dense reward bonus in conjunction with the outcome 0/1 reward RL.
arXiv Detail & Related papers (2025-03-10T17:40:43Z) - Scalable Best-of-N Selection for Large Language Models via Self-Certainty [75.1351701045874]
Best-of-N selection is a key technique for improving the reasoning performance of Large Language Models (LLMs)<n>We propose self-certainty, a novel and efficient metric that leverages the inherent probability distribution of LLM outputs to estimate response quality without requiring external reward models.<n>Our findings establish self-certainty as a practical and efficient way for improving LLM reasoning capabilities.
arXiv Detail & Related papers (2025-02-25T19:08:07Z) - Fairshare Data Pricing via Data Valuation for Large Language Models [22.96743502195587]
This paper introduces a theoretical framework for large language models (LLMs) data markets.<n>We show how exploitative pricing drives high-quality sellers out of the market.<n>We then introduce fairshare, a pricing mechanism grounded in data valuation.
arXiv Detail & Related papers (2025-01-31T22:27:34Z) - Self-Refinement Strategies for LLM-based Product Attribute Value Extraction [51.45146101802871]
This paper investigates applying two self-refinement techniques to the product attribute value extraction task.<n>The experiments show that both self-refinement techniques fail to significantly improve the extraction performance while substantially increasing processing costs.<n>For scenarios with development data, fine-tuning yields the highest performance, while the ramp-up costs of fine-tuning are balanced out as the amount of product descriptions increases.
arXiv Detail & Related papers (2025-01-02T12:55:27Z) - SMART: Automatically Scaling Down Language Models with Accuracy Guarantees for Reduced Processing Fees [21.801053526411415]
Large Language Models (LLMs) have significantly boosted performance in natural language processing (NLP) tasks.
The deployment of high-performance LLMs incurs substantial costs, primarily due to the increased number of parameters aimed at enhancing model performance.
We introduce SMART, a novel framework designed to minimize the inference costs of NLP tasks while ensuring sufficient result quality.
arXiv Detail & Related papers (2024-03-11T17:45:47Z) - Enhancing User' s Income Estimation with Super-App Alternative Data [59.60094442546867]
It compares the performance of these alternative data sources with the performance of industry-accepted bureau income estimators.
Ultimately, this paper shows the incentive for financial institutions to seek to incorporate alternative data into constructing their risk profiles.
arXiv Detail & Related papers (2021-04-12T21:34:44Z) - A Game-Theoretic Analysis of the Empirical Revenue Maximization
Algorithm with Endogenous Sampling [19.453243313852557]
Empirical Revenue Maximization (ERM) is one of the most important price learning algorithms in auction design.
We generalize the definition of an incentive-awareness measure proposed by Lavi et al to quantify the reduction of ERM's outputted price due to a change of $mge 1$ out of $N$ input samples.
We construct an efficient, approximately incentive-compatible, and revenue-optimal learning algorithm using ERM in repeated auctions against non-myopic bidders, and show approximate group incentive-compatibility in uniform-price auctions.
arXiv Detail & Related papers (2020-10-12T08:20:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.