Related papers: Cost Transparency of Enterprise AI Adoption

Cost Transparency of Enterprise AI Adoption

URL: http://arxiv.org/abs/2511.11761v1
Date: Fri, 14 Nov 2025 01:51:31 GMT
Title: Cost Transparency of Enterprise AI Adoption
Authors: Soogand Alavi, Salar Nozari, Andrea Luangrath,
Abstract summary: This study shows that subtle shifts in linguistic style can alter the number of output tokens without impacting response quality.<n>Non-polite prompts significantly increase output tokens leading to higher enterprise costs and additional revenue for OpenAI.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in large language models (LLMs) have dramatically improved performance on a wide range of tasks, driving rapid enterprise adoption. Yet, the cost of adopting these AI services is understudied. Unlike traditional software licensing in which costs are predictable before usage, commercial LLM services charge per token of input text in addition to generated output tokens. Crucially, while firms can control the input, they have limited control over output tokens, which are effectively set by generation dynamics outside of business control. This research shows that subtle shifts in linguistic style can systematically alter the number of output tokens without impacting response quality. Using an experiment with OpenAI's API, this study reveals that non-polite prompts significantly increase output tokens leading to higher enterprise costs and additional revenue for OpenAI. Politeness is merely one instance of a broader phenomenon in which linguistic structure can drive unpredictable cost variation. For enterprises integrating LLM into applications, this unpredictability complicates budgeting and undermines transparency in business-to-business contexts. By demonstrating how end-user behavior links to enterprise costs through output token counts, this work highlights the opacity of current pricing models and calls for new approaches to ensure predictable and transparent adoption of LLM services.

Related papers

Visualizing token importance for black-box language models [48.747801442240565]
We consider the problem of auditing black-box large language models (LLMs) to ensure they behave reliably when deployed in production settings.<n>We propose Distribution-Based Sensitivity Analysis (DBSA) to evaluate the sensitivity of the output of a language model for each input token.
arXiv Detail & Related papers (2025-12-12T14:01:43Z)
Budget-Aware Tool-Use Enables Effective Agent Scaling [82.6942342482552]
Scaling test-time computation improves performance across different tasks on large language models (LLMs)<n>We study how to scale such agents effectively under explicit tool-call budgets, focusing on web search agents.<n>We introduce the Budget Tracker, a lightweight plug-in that provides the agent with continuous budget awareness.
arXiv Detail & Related papers (2025-11-21T07:18:55Z)
Thinking Augmented Pre-training [88.04395622064708]
Thinking augmented Pre-Training is a universal methodology that augments text with automatically generated thinking trajectories.<n>This paper introduces a simple and scalable approach to improve the data efficiency of large language model (LLM) training by augmenting existing text data with thinking trajectories.
arXiv Detail & Related papers (2025-09-24T14:45:13Z)
Is Your LLM Overcharging You? Tokenization, Transparency, and Incentives [13.91198481393699]
We develop an efficient algorithm that allows providers to significantly overcharge users without raising suspicion.<n>We show that to eliminate the financial incentive to strategize, a pricing mechanism must price tokens linearly on their character count.
arXiv Detail & Related papers (2025-05-27T18:02:12Z)
Invisible Tokens, Visible Bills: The Urgent Need to Audit Hidden Operations in Opaque LLM Services [22.700907666937177]
This position paper highlights emerging accountability challenges in commercial Opaque LLM Services (COLS)<n>We formalize two key risks: textitquantity inflation, where token and call counts may be artificially inflated, and textitquality downgrade, where providers might quietly substitute lower-cost models or tools.<n>We propose a modular three-layer auditing framework for COLS and users that enables trustworthy verification across execution, secure logging, and user-facing auditability without exposing proprietary internals.
arXiv Detail & Related papers (2025-05-24T02:26:49Z)
CoIn: Counting the Invisible Reasoning Tokens in Commercial Opaque LLM APIs [13.31195673556853]
We propose CoIn, a verification framework that audits both the quantity and semantic validity of hidden tokens.<n>Experiments demonstrate that CoIn, when deployed as a trusted third-party auditor, can effectively detect token count inflation with a success rate reaching up to 94.7%.
arXiv Detail & Related papers (2025-05-19T23:39:23Z)
Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs [71.7892165868749]
Commercial Large Language Model (LLM) APIs create a fundamental trust problem.<n>Users pay for specific models but have no guarantee that providers deliver them faithfully.<n>We formalize this model substitution problem and evaluate detection methods under realistic adversarial conditions.<n>We propose and evaluate the use of Trusted Execution Environments (TEEs) as one practical and robust solution.
arXiv Detail & Related papers (2025-04-07T03:57:41Z)
CITER: Collaborative Inference for Efficient Large Language Model Decoding with Token-Level Routing [74.14816777318033]
Collaborative Inference with Token-lEvel Routing (CITER) is a framework that enables efficient collaboration between small and large language models.<n>We formulate router training as a policy optimization, where the router receives rewards based on both the quality of predictions and the inference costs of generation.<n>Our experiments show that CITER reduces the inference costs while preserving high-quality generation, offering a promising solution for real-time and resource-constrained applications.
arXiv Detail & Related papers (2025-02-04T03:36:44Z)
Large Language Models for Supply Chain Optimization [4.554094815136834]
We study how Large Language Models (LLMs) can help bridge the gap between supply chain automation and human comprehension and trust thereof. We design OptiGuide -- a framework that accepts as input queries in plain text, and outputs insights about the underlying outcomes. We demonstrate the effectiveness of our framework on a real server placement scenario within Microsoft's cloud supply chain.
arXiv Detail & Related papers (2023-07-08T01:42:22Z)
OverPrompt: Enhancing ChatGPT through Efficient In-Context Learning [49.38867353135258]
We propose OverPrompt, leveraging the in-context learning capability of LLMs to handle multiple task inputs. Our experiments show that OverPrompt can achieve cost-efficient zero-shot classification without causing significant detriment to task performance.
arXiv Detail & Related papers (2023-05-24T10:08:04Z)
Can ChatGPT Forecast Stock Price Movements? Return Predictability and Large Language Models [48.87381259980254]
We document the capability of large language models (LLMs) like ChatGPT to predict stock market reactions from news headlines without direct financial training.<n>Using post-knowledge-cutoff headlines, GPT-4 captures initial market responses, achieving approximately 90% portfolio-day hit rates for the non-tradable initial reaction.
arXiv Detail & Related papers (2023-04-15T19:22:37Z)
Accelerating Vision-Language Pretraining with Free Language Modeling [62.30042851111692]
Free language modeling (FLM) enables a 100% prediction rate with arbitrary corruption rates. FLM successfully frees the prediction rate from the tie-up with the corruption rate. Experiments show FLM could achieve an impressive 2.5x pretraining time reduction.
arXiv Detail & Related papers (2023-03-24T14:49:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.