Related papers: Is Your LLM Overcharging You? Tokenization, Transparency, and Incentives

Is Your LLM Overcharging You? Tokenization, Transparency, and Incentives

URL: http://arxiv.org/abs/2505.21627v2
Date: Tue, 09 Sep 2025 17:37:26 GMT
Title: Is Your LLM Overcharging You? Tokenization, Transparency, and Incentives
Authors: Ander Artola Velasco, Stratis Tsirtsis, Nastaran Okati, Manuel Gomez-Rodriguez,
Abstract summary: We develop an efficient algorithm that allows providers to significantly overcharge users without raising suspicion.<n>We show that to eliminate the financial incentive to strategize, a pricing mechanism must price tokens linearly on their character count.
Score: 13.91198481393699
License: http://creativecommons.org/licenses/by/4.0/
Abstract: State-of-the-art large language models require specialized hardware and substantial energy to operate. As a consequence, cloud-based services that provide access to large language models have become very popular. In these services, the price users pay for an output provided by a model depends on the number of tokens the model uses to generate it -- they pay a fixed price per token. In this work, we show that this pricing mechanism creates a financial incentive for providers to strategize and misreport the (number of) tokens a model used to generate an output, and users cannot prove, or even know, whether a provider is overcharging them. However, we also show that, if an unfaithful provider is obliged to be transparent about the generative process used by the model, misreporting optimally without raising suspicion is hard. Nevertheless, as a proof-of-concept, we develop an efficient heuristic algorithm that allows providers to significantly overcharge users without raising suspicion. Crucially, we demonstrate that the cost of running the algorithm is lower than the additional revenue from overcharging users, highlighting the vulnerability of users under the current pay-per-token pricing mechanism. Further, we show that, to eliminate the financial incentive to strategize, a pricing mechanism must price tokens linearly on their character count. While this makes a provider's profit margin vary across tokens, we introduce a simple prescription under which the provider who adopts such an incentive-compatible pricing mechanism can maintain the average profit margin they had under the pay-per-token pricing mechanism. Along the way, to illustrate and complement our theoretical results, we conduct experiments with several large language models from the $\texttt{Llama}$, $\texttt{Gemma}$ and $\texttt{Ministral}$ families, and input prompts from the LMSYS Chatbot Arena platform.

Related papers

IMMACULATE: A Practical LLM Auditing Framework via Verifiable Computation [49.796717294455796]
We present IMMACULATE, a practical auditing framework that detects economically motivated deviations.<n>IMMACULATE selectively audits a small fraction of requests using verifiable computation, achieving strong detection guarantees while amortizing cryptographic overhead.
arXiv Detail & Related papers (2026-02-26T07:21:02Z)
CREDIT: Certified Ownership Verification of Deep Neural Networks Against Model Extraction Attacks [54.04030169323115]
We introduce CREDIT, a certified ownership verification against Model Extraction Attacks (MEAs)<n>We quantify the similarity between DNN models, propose a practical verification threshold, and provide rigorous theoretical guarantees for ownership verification based on this threshold.<n>We extensively evaluate our approach on several mainstream datasets across different domains and tasks, achieving state-of-the-art performance.
arXiv Detail & Related papers (2026-02-23T23:36:25Z)
Test-Time Compute Games [11.108199754300772]
Test-time compute has emerged as a promising strategy to enhance the reasoning abilities of large language models.<n>We show that the market of LLM-as-a-service is socially inefficient, since providers have a financial incentive to increase the amount of test-time compute.<n>We introduce a reverse second-price auction mechanism where providers bid their offered price and (expected) quality for the opportunity to serve a user.
arXiv Detail & Related papers (2026-01-29T15:18:01Z)
Cost Transparency of Enterprise AI Adoption [0.0]
This study shows that subtle shifts in linguistic style can alter the number of output tokens without impacting response quality.<n>Non-polite prompts significantly increase output tokens leading to higher enterprise costs and additional revenue for OpenAI.
arXiv Detail & Related papers (2025-11-14T01:51:31Z)
Pay for The Second-Best Service: A Game-Theoretic Approach Against Dishonest LLM Providers [18.350214149130267]
This work tackles the issue through the lens of game theory and mechanism design.<n>We are the first to propose a formal economic model for a realistic user-provider ecosystem.
arXiv Detail & Related papers (2025-11-02T08:18:20Z)
Auditing Pay-Per-Token in Large Language Models [11.795056270534287]
We develop an auditing framework based on martingale theory to detect token misreporting.<n>Our framework is guaranteed to always detect token misreporting, regardless of the provider's (mis-)reporting policy.
arXiv Detail & Related papers (2025-10-05T17:47:16Z)
Interactive Reasoning: Visualizing and Controlling Chain-of-Thought Reasoning in Large Language Models [54.85405423240165]
We introduce Interactive Reasoning, an interaction design that visualizes chain-of-thought outputs as a hierarchy of topics.<n>We implement interactive reasoning in Hippo, a prototype for AI-assisted decision making in the face of uncertain trade-offs.
arXiv Detail & Related papers (2025-06-30T10:00:43Z)
Invisible Tokens, Visible Bills: The Urgent Need to Audit Hidden Operations in Opaque LLM Services [22.700907666937177]
This position paper highlights emerging accountability challenges in commercial Opaque LLM Services (COLS)<n>We formalize two key risks: textitquantity inflation, where token and call counts may be artificially inflated, and textitquality downgrade, where providers might quietly substitute lower-cost models or tools.<n>We propose a modular three-layer auditing framework for COLS and users that enables trustworthy verification across execution, secure logging, and user-facing auditability without exposing proprietary internals.
arXiv Detail & Related papers (2025-05-24T02:26:49Z)
CoIn: Counting the Invisible Reasoning Tokens in Commercial Opaque LLM APIs [13.31195673556853]
We propose CoIn, a verification framework that audits both the quantity and semantic validity of hidden tokens.<n>Experiments demonstrate that CoIn, when deployed as a trusted third-party auditor, can effectively detect token count inflation with a success rate reaching up to 94.7%.
arXiv Detail & Related papers (2025-05-19T23:39:23Z)
Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs [60.881609323604685]
Large Language Models (LLMs) accessed via black-box APIs introduce a trust challenge.<n>Users pay for services based on advertised model capabilities.<n> providers may covertly substitute the specified model with a cheaper, lower-quality alternative to reduce operational costs.<n>This lack of transparency undermines fairness, erodes trust, and complicates reliable benchmarking.
arXiv Detail & Related papers (2025-04-07T03:57:41Z)
The Invisible Hand: Unveiling Provider Bias in Large Language Models for Code Generation [37.66613667849016]
Large Language Models (LLMs) have emerged as the new recommendation engines.<n>We show that without explicit directives, these models show systematic preferences for services from specific providers in their recommendations.<n>We conduct the first comprehensive empirical study of provider bias in LLM code generation across seven state-of-the-art LLMs.
arXiv Detail & Related papers (2025-01-14T05:21:27Z)
Inference Optimal VLMs Need Fewer Visual Tokens and More Parameters [54.01228554126122]
Vision Language Models (VLMs) have demonstrated strong capabilities across various visual understanding and reasoning tasks.<n>To reduce inference costs, one can either downsize the Large Language Models (LLMs) or reduce the number of input tokens needed to represent the image.<n>We take the first steps toward designing token compression algorithms tailored for high-compression settings.
arXiv Detail & Related papers (2024-11-05T18:54:21Z)
SVIP: Towards Verifiable Inference of Open-source Large Language Models [33.910670775972335]
Open-source Large Language Models (LLMs) have recently demonstrated remarkable capabilities in natural language understanding and generation, leading to widespread adoption across various domains. Their increasing model sizes render local deployment impractical for individual users, pushing many to rely on computing service providers for inference through a blackbox API. This reliance introduces a new risk: a computing provider may stealthily substitute the requested LLM with a smaller, less capable model without consent from users, thereby delivering inferior outputs while benefiting from cost savings.
arXiv Detail & Related papers (2024-10-29T17:52:45Z)
Beyond Labeling Oracles: What does it mean to steal ML models? [52.63413852460003]
Model extraction attacks are designed to steal trained models with only query access. We investigate factors influencing the success of model extraction attacks. Our findings urge the community to redefine the adversarial goals of ME attacks.
arXiv Detail & Related papers (2023-10-03T11:10:21Z)
Generative AI for End-to-End Limit Order Book Modelling: A Token-Level Autoregressive Generative Model of Message Flow Using a Deep State Space Network [7.54290390842336]
We propose an end-to-end autoregressive generative model that generates tokenized limit order book (LOB) messages. Using NASDAQ equity LOBs, we develop a custom tokenizer for message data, converting groups of successive digits to tokens. Results show promising performance in approximating the data distribution, as evidenced by low model perplexity.
arXiv Detail & Related papers (2023-08-23T09:37:22Z)
Do All Languages Cost the Same? Tokenization in the Era of Commercial Language Models [68.29126169579132]
API vendors charge their users based on usage, more specifically on the number of tokens'' processed or generated by the underlying language models. What constitutes a token, however, is training data and model dependent with a large variance in the number of tokens required to convey the same information in different languages. We conduct a systematic analysis of the cost and utility of OpenAI's language model API on multilingual benchmarks in 22 typologically diverse languages.
arXiv Detail & Related papers (2023-05-23T05:46:45Z)
Cheaply Evaluating Inference Efficiency Metrics for Autoregressive Transformer APIs [66.30706841821123]
Large language models (LLMs) power many state-of-the-art systems in natural language processing. LLMs are extremely computationally expensive, even at inference time. We propose a new metric for comparing inference efficiency across models.
arXiv Detail & Related papers (2023-05-03T21:51:42Z)
On the Difficulty of Defending Self-Supervised Learning against Model Extraction [23.497838165711983]
Self-Supervised Learning (SSL) is an increasingly popular ML paradigm that trains models to transform complex inputs into representations without relying on explicit labels. This paper explores model stealing attacks against SSL. We construct several novel attacks and find that approaches that train directly on a victim's stolen representations are query efficient and enable high accuracy for downstream models.
arXiv Detail & Related papers (2022-05-16T17:20:44Z)
LAVA NAT: A Non-Autoregressive Translation Model with Look-Around Decoding and Vocabulary Attention [54.18121922040521]
Non-autoregressive translation (NAT) models generate multiple tokens in one forward pass. These NAT models often suffer from the multimodality problem, generating duplicated tokens or missing tokens. We propose two novel methods to address this issue, the Look-Around (LA) strategy and the Vocabulary Attention (VA) mechanism.
arXiv Detail & Related papers (2020-02-08T04:11:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.