IMMACULATE: A Practical LLM Auditing Framework via Verifiable Computation
- URL: http://arxiv.org/abs/2602.22700v1
- Date: Thu, 26 Feb 2026 07:21:02 GMT
- Title: IMMACULATE: A Practical LLM Auditing Framework via Verifiable Computation
- Authors: Yanpei Guo, Wenjie Qu, Linyu Wu, Shengfang Zhai, Lionel Z. Wang, Ming Xu, Yue Liu, Binhang Yuan, Dawn Song, Jiaheng Zhang,
- Abstract summary: We present IMMACULATE, a practical auditing framework that detects economically motivated deviations.<n>IMMACULATE selectively audits a small fraction of requests using verifiable computation, achieving strong detection guarantees while amortizing cryptographic overhead.
- Score: 49.796717294455796
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Commercial large language models are typically deployed as black-box API services, requiring users to trust providers to execute inference correctly and report token usage honestly. We present IMMACULATE, a practical auditing framework that detects economically motivated deviations-such as model substitution, quantization abuse, and token overbilling-without trusted hardware or access to model internals. IMMACULATE selectively audits a small fraction of requests using verifiable computation, achieving strong detection guarantees while amortizing cryptographic overhead. Experiments on dense and MoE models show that IMMACULATE reliably distinguishes benign and malicious executions with under 1% throughput overhead. Our code is published at https://github.com/guo-yanpei/Immaculate.
Related papers
- CREDIT: Certified Ownership Verification of Deep Neural Networks Against Model Extraction Attacks [54.04030169323115]
We introduce CREDIT, a certified ownership verification against Model Extraction Attacks (MEAs)<n>We quantify the similarity between DNN models, propose a practical verification threshold, and provide rigorous theoretical guarantees for ownership verification based on this threshold.<n>We extensively evaluate our approach on several mainstream datasets across different domains and tasks, achieving state-of-the-art performance.
arXiv Detail & Related papers (2026-02-23T23:36:25Z) - Visualizing token importance for black-box language models [48.747801442240565]
We consider the problem of auditing black-box large language models (LLMs) to ensure they behave reliably when deployed in production settings.<n>We propose Distribution-Based Sensitivity Analysis (DBSA) to evaluate the sensitivity of the output of a language model for each input token.
arXiv Detail & Related papers (2025-12-12T14:01:43Z) - ImpossibleBench: Measuring LLMs' Propensity of Exploiting Test Cases [58.411135609139855]
"Shortcuts" to complete tasks pose significant risks for reliable assessment and deployment of large language models.<n>We introduce ImpossibleBench, a benchmark framework that measures LLM agents' propensity to exploit test cases.<n>As a practical framework, ImpossibleBench is not just an evaluation but a versatile tool.
arXiv Detail & Related papers (2025-10-23T06:58:32Z) - Auditing Pay-Per-Token in Large Language Models [11.795056270534287]
We develop an auditing framework based on martingale theory to detect token misreporting.<n>Our framework is guaranteed to always detect token misreporting, regardless of the provider's (mis-)reporting policy.
arXiv Detail & Related papers (2025-10-05T17:47:16Z) - Predictive Auditing of Hidden Tokens in LLM APIs via Reasoning Length Estimation [7.928002407828304]
Commercial LLM services often conceal internal reasoning traces while still charging users for every generated token.<n> PALACE estimates hidden reasoning token counts from prompt-answer pairs without access to internal traces.<n>Experiments on math, coding, medical, and general reasoning benchmarks show that PALACE achieves low relative error and strong prediction accuracy.
arXiv Detail & Related papers (2025-07-29T19:50:55Z) - Auditing Black-Box LLM APIs with a Rank-Based Uniformity Test [24.393978712663618]
API providers may discreetly serve quantized or fine-tuned variants to reduce costs or maliciously alter model behaviors.<n>We propose a rank-based uniformity test that can verify the behavioral equality of a black-box LLM to a locally deployed authentic model.<n>We evaluate the approach across diverse threat scenarios, including quantization, harmful fine-tuning, jailbreak prompts, and full model substitution.
arXiv Detail & Related papers (2025-06-08T03:00:31Z) - CoIn: Counting the Invisible Reasoning Tokens in Commercial Opaque LLM APIs [13.31195673556853]
We propose CoIn, a verification framework that audits both the quantity and semantic validity of hidden tokens.<n>Experiments demonstrate that CoIn, when deployed as a trusted third-party auditor, can effectively detect token count inflation with a success rate reaching up to 94.7%.
arXiv Detail & Related papers (2025-05-19T23:39:23Z) - Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs [71.7892165868749]
Commercial Large Language Model (LLM) APIs create a fundamental trust problem.<n>Users pay for specific models but have no guarantee that providers deliver them faithfully.<n>We formalize this model substitution problem and evaluate detection methods under realistic adversarial conditions.<n>We propose and evaluate the use of Trusted Execution Environments (TEEs) as one practical and robust solution.
arXiv Detail & Related papers (2025-04-07T03:57:41Z) - OpenFactCheck: Building, Benchmarking Customized Fact-Checking Systems and Evaluating the Factuality of Claims and LLMs [59.836774258359945]
OpenFactCheck is a framework for building customized automatic fact-checking systems.<n>It allows users to easily customize an automatic fact-checker and verify the factual correctness of documents and claims.<n>CheckerEVAL is a solution for gauging the reliability of automatic fact-checkers' verification results using human-annotated datasets.
arXiv Detail & Related papers (2024-05-09T07:15:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.