Real Money, Fake Models: Deceptive Model Claims in Shadow APIs
- URL: http://arxiv.org/abs/2603.01919v2
- Date: Thu, 05 Mar 2026 00:42:02 GMT
- Title: Real Money, Fake Models: Deceptive Model Claims in Shadow APIs
- Authors: Yage Zhang, Yukun Jiang, Zeyuan Chen, Michael Backes, Xinyue Shen, Yang Zhang,
- Abstract summary: Third-party services claim to provide access to official model services without regional limitations via indirect access.<n>Despite their widespread use, it remains unclear whether shadow APIs deliver outputs consistent with those of the official APIs.<n>These practices undermine the validity of scientific research, harm the interests of shadow API users, and damage the reputation of official model providers.
- Score: 26.860718016839126
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Access to frontier large language models (LLMs), such as GPT-5 and Gemini-2.5, is often hindered by high pricing, payment barriers, and regional restrictions. These limitations drive the proliferation of $\textit{shadow APIs}$, third-party services that claim to provide access to official model services without regional limitations via indirect access. Despite their widespread use, it remains unclear whether shadow APIs deliver outputs consistent with those of the official APIs, raising concerns about the reliability of downstream applications and the validity of research findings that depend on them. In this paper, we present the first systematic audit between official LLM APIs and corresponding shadow APIs. We first identify 17 shadow APIs that have been utilized in 187 academic papers, with the most popular one reaching 5,966 citations and 58,639 GitHub stars by December 6, 2025. Through multidimensional auditing of three representative shadow APIs across utility, safety, and model verification, we uncover both indirect and direct evidence of deception practices in shadow APIs. Specifically, we reveal performance divergence reaching up to $47.21\%$, significant unpredictability in safety behaviors, and identity verification failures in $45.83\%$ of fingerprint tests. These deceptive practices critically undermine the reproducibility and validity of scientific research, harm the interests of shadow API users, and damage the reputation of official model providers.
Related papers
- IMMACULATE: A Practical LLM Auditing Framework via Verifiable Computation [49.796717294455796]
We present IMMACULATE, a practical auditing framework that detects economically motivated deviations.<n>IMMACULATE selectively audits a small fraction of requests using verifiable computation, achieving strong detection guarantees while amortizing cryptographic overhead.
arXiv Detail & Related papers (2026-02-26T07:21:02Z) - TikTok's Research API: Problems Without Explanations [2.06242362470764]
TikTok augmented its Research API access within Europe in July 2023.<n>Despite this expansion, notable limitations and inconsistencies persist within the data provided.<n>The API data is incomplete, making it unreliable when working with data donations.
arXiv Detail & Related papers (2025-06-11T13:50:06Z) - CoIn: Counting the Invisible Reasoning Tokens in Commercial Opaque LLM APIs [13.31195673556853]
We propose CoIn, a verification framework that audits both the quantity and semantic validity of hidden tokens.<n>Experiments demonstrate that CoIn, when deployed as a trusted third-party auditor, can effectively detect token count inflation with a success rate reaching up to 94.7%.
arXiv Detail & Related papers (2025-05-19T23:39:23Z) - Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs [71.7892165868749]
Commercial Large Language Model (LLM) APIs create a fundamental trust problem.<n>Users pay for specific models but have no guarantee that providers deliver them faithfully.<n>We formalize this model substitution problem and evaluate detection methods under realistic adversarial conditions.<n>We propose and evaluate the use of Trusted Execution Environments (TEEs) as one practical and robust solution.
arXiv Detail & Related papers (2025-04-07T03:57:41Z) - Your Fix Is My Exploit: Enabling Comprehensive DL Library API Fuzzing with Large Language Models [49.214291813478695]
Deep learning (DL) libraries, widely used in AI applications, often contain vulnerabilities like overflows and use buffer-free errors.<n>Traditional fuzzing struggles with the complexity and API diversity of DL libraries.<n>We propose DFUZZ, an LLM-driven fuzzing approach for DL libraries.
arXiv Detail & Related papers (2025-01-08T07:07:22Z) - SEAL: Suite for Evaluating API-use of LLMs [1.2528321519119252]
SEAL is an end-to-end testbed designed to evaluate large language models in real-world API usage.
It standardizes existing benchmarks, integrates an agent system for testing API retrieval and planning, and addresses the instability of real-time APIs.
arXiv Detail & Related papers (2024-09-23T20:16:49Z) - FANTAstic SEquences and Where to Find Them: Faithful and Efficient API Call Generation through State-tracked Constrained Decoding and Reranking [57.53742155914176]
API call generation is the cornerstone of large language models' tool-using ability.
Existing supervised and in-context learning approaches suffer from high training costs, poor data efficiency, and generated API calls that can be unfaithful to the API documentation and the user's request.
We propose an output-side optimization approach called FANTASE to address these limitations.
arXiv Detail & Related papers (2024-07-18T23:44:02Z) - A Classification-by-Retrieval Framework for Few-Shot Anomaly Detection to Detect API Injection Attacks [9.693391036125908]
We propose a novel unsupervised few-shot anomaly detection framework composed of two main parts.
First, we train a dedicated generic language model for API based on FastText embedding.
Next, we use Approximate Nearest Neighbor search in a classification-by-retrieval approach.
arXiv Detail & Related papers (2024-05-18T10:15:31Z) - Evaluating Embedding APIs for Information Retrieval [51.24236853841468]
We evaluate the capabilities of existing semantic embedding APIs on domain generalization and multilingual retrieval.
We find that re-ranking BM25 results using the APIs is a budget-friendly approach and is most effective in English.
For non-English retrieval, re-ranking still improves the results, but a hybrid model with BM25 works best, albeit at a higher cost.
arXiv Detail & Related papers (2023-05-10T16:40:52Z) - Simple Transparent Adversarial Examples [65.65977217108659]
We introduce secret embedding and transparent adversarial examples as a simpler way to evaluate robustness.
As a result, they pose a serious threat where APIs are used for high-stakes applications.
arXiv Detail & Related papers (2021-05-20T11:54:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.