Related papers: CoIn: Counting the Invisible Reasoning Tokens in Commercial Opaque LLM APIs

CoIn: Counting the Invisible Reasoning Tokens in Commercial Opaque LLM APIs

URL: http://arxiv.org/abs/2505.13778v1
Date: Mon, 19 May 2025 23:39:23 GMT
Title: CoIn: Counting the Invisible Reasoning Tokens in Commercial Opaque LLM APIs
Authors: Guoheng Sun, Ziyao Wang, Bowei Tian, Meng Liu, Zheyu Shen, Shwai He, Yexiao He, Wanghao Ye, Yiting Wang, Ang Li,
Abstract summary: We propose CoIn, a verification framework that audits both the quantity and semantic validity of hidden tokens.<n>Experiments demonstrate that CoIn, when deployed as a trusted third-party auditor, can effectively detect token count inflation with a success rate reaching up to 94.7%.
Score: 13.31195673556853
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As post-training techniques evolve, large language models (LLMs) are increasingly augmented with structured multi-step reasoning abilities, often optimized through reinforcement learning. These reasoning-enhanced models outperform standard LLMs on complex tasks and now underpin many commercial LLM APIs. However, to protect proprietary behavior and reduce verbosity, providers typically conceal the reasoning traces while returning only the final answer. This opacity introduces a critical transparency gap: users are billed for invisible reasoning tokens, which often account for the majority of the cost, yet have no means to verify their authenticity. This opens the door to token count inflation, where providers may overreport token usage or inject synthetic, low-effort tokens to inflate charges. To address this issue, we propose CoIn, a verification framework that audits both the quantity and semantic validity of hidden tokens. CoIn constructs a verifiable hash tree from token embedding fingerprints to check token counts, and uses embedding-based relevance matching to detect fabricated reasoning content. Experiments demonstrate that CoIn, when deployed as a trusted third-party auditor, can effectively detect token count inflation with a success rate reaching up to 94.7%, showing the strong ability to restore billing transparency in opaque LLM services. The dataset and code are available at https://github.com/CASE-Lab-UMD/LLM-Auditing-CoIn.

Related papers

Predictive Auditing of Hidden Tokens in LLM APIs via Reasoning Length Estimation [7.928002407828304]
Commercial LLM services often conceal internal reasoning traces while still charging users for every generated token.<n> PALACE estimates hidden reasoning token counts from prompt-answer pairs without access to internal traces.<n>Experiments on math, coding, medical, and general reasoning benchmarks show that PALACE achieves low relative error and strong prediction accuracy.
arXiv Detail & Related papers (2025-07-29T19:50:55Z)
Verifying the Verifiers: Unveiling Pitfalls and Potentials in Fact Verifiers [59.168391398830515]
We evaluate 12 pre-trained LLMs and one specialized fact-verifier, using a collection of examples from 14 fact-checking benchmarks.<n>We highlight the importance of addressing annotation errors and ambiguity in datasets.<n> frontier LLMs with few-shot in-context examples, often overlooked in previous works, achieve top-tier performance.
arXiv Detail & Related papers (2025-06-16T10:32:10Z)
TRAPDOC: Deceiving LLM Users by Injecting Imperceptible Phantom Tokens into Documents [4.753535328327316]
Over-reliance on large language models (LLMs) is emerging as a significant social issue.<n>We propose a method injecting imperceptible phantom tokens into documents, which causes LLMs to generate outputs that appear plausible to users but are in fact incorrect.<n>Based on this technique, we introduce TRAPDOC, a framework designed to deceive over-reliant LLM users.
arXiv Detail & Related papers (2025-05-30T07:16:53Z)
Invisible Tokens, Visible Bills: The Urgent Need to Audit Hidden Operations in Opaque LLM Services [22.700907666937177]
This position paper highlights emerging accountability challenges in commercial Opaque LLM Services (COLS)<n>We formalize two key risks: textitquantity inflation, where token and call counts may be artificially inflated, and textitquality downgrade, where providers might quietly substitute lower-cost models or tools.<n>We propose a modular three-layer auditing framework for COLS and users that enables trustworthy verification across execution, secure logging, and user-facing auditability without exposing proprietary internals.
arXiv Detail & Related papers (2025-05-24T02:26:49Z)
Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs [60.881609323604685]
Large Language Models (LLMs) accessed via black-box APIs introduce a trust challenge.<n>Users pay for services based on advertised model capabilities.<n> providers may covertly substitute the specified model with a cheaper, lower-quality alternative to reduce operational costs.<n>This lack of transparency undermines fairness, erodes trust, and complicates reliable benchmarking.
arXiv Detail & Related papers (2025-04-07T03:57:41Z)
Learning on LLM Output Signatures for gray-box LLM Behavior Analysis [52.81120759532526]
Large Language Models (LLMs) have achieved widespread adoption, yet our understanding of their behavior remains limited.<n>We develop a transformer-based approach to process that theoretically guarantees approximation of existing techniques.<n>Our approach achieves superior performance on hallucination and data contamination detection in gray-box settings.
arXiv Detail & Related papers (2025-03-18T09:04:37Z)
Information-Guided Identification of Training Data Imprint in (Proprietary) Large Language Models [52.439289085318634]
We show how to identify training data known to proprietary large language models (LLMs) by using information-guided probes.<n>Our work builds on a key observation: text passages with high surprisal are good search material for memorization probes.
arXiv Detail & Related papers (2025-03-15T10:19:15Z)
ExLM: Rethinking the Impact of [MASK] Tokens in Masked Language Models [11.997499811414837]
Masked Language Models (ML)Mss are trained by randomly masking portions of the input sequences with [MASK] tokens and learning to reconstruct the original content based on the remaining context.
arXiv Detail & Related papers (2025-01-23T05:46:50Z)
Token Highlighter: Inspecting and Mitigating Jailbreak Prompts for Large Language Models [61.916827858666906]
Large Language Models (LLMs) are increasingly being integrated into services such as ChatGPT to provide responses to user queries.<n>This paper proposes a method called Token Highlighter to inspect and mitigate the potential jailbreak threats in the user query.
arXiv Detail & Related papers (2024-12-24T05:10:02Z)
Inference Optimal VLMs Need Fewer Visual Tokens and More Parameters [54.01228554126122]
Vision Language Models (VLMs) have demonstrated strong capabilities across various visual understanding and reasoning tasks.<n>To reduce inference costs, one can either downsize the Large Language Models (LLMs) or reduce the number of input tokens needed to represent the image.<n>We take the first steps toward designing token compression algorithms tailored for high-compression settings.
arXiv Detail & Related papers (2024-11-05T18:54:21Z)
Large Language Models as Carriers of Hidden Messages [0.0]
Simple fine-tuning can embed hidden text into large language models (LLMs), which is revealed only when triggered by a specific query.<n>Our work demonstrates that embedding hidden text via fine-tuning, although seemingly secure due to the vast number of potential triggers, is vulnerable to extraction.<n>We introduce an extraction attack called Unconditional Token Forcing (UTF), which iteratively feeds tokens from the LLM's vocabulary to reveal sequences with high token probabilities, indicating hidden text candidates.
arXiv Detail & Related papers (2024-06-04T16:49:06Z)
Detecting DeFi Securities Violations from Token Smart Contract Code [0.4263043028086136]
Decentralized Finance (DeFi) is a system of financial products and services built and delivered through smart contracts on various blockchains. This study aims to uncover whether we can identify DeFi projects potentially engaging in securities violations based on their tokens' smart contract code.
arXiv Detail & Related papers (2021-12-06T01:44:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.