Privacy in Theory, Bugs in Practice: Grey-Box Auditing of Differential Privacy Libraries
- URL: http://arxiv.org/abs/2602.17454v1
- Date: Thu, 19 Feb 2026 15:18:00 GMT
- Title: Privacy in Theory, Bugs in Practice: Grey-Box Auditing of Differential Privacy Libraries
- Authors: Tudor Cebere, David Erb, Damien Desfontaines, Aurélien Bellet, Jack Fitzsimons,
- Abstract summary: We introduce Re:cord-play, a gray-box auditing paradigm that inspects the internal state of DP algorithms.<n>By running an instrumented algorithm on neighboring datasets with identical randomness, Re:cord-play directly checks for data-dependent control flow.<n>We show that our novel testing approach is both effective and necessary by auditing 12 open-source libraries.
- Score: 11.924357290256374
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Differential privacy (DP) implementations are notoriously prone to errors, with subtle bugs frequently invalidating theoretical guarantees. Existing verification methods are often impractical: formal tools are too restrictive, while black-box statistical auditing is intractable for complex pipelines and fails to pinpoint the source of the bug. This paper introduces Re:cord-play, a gray-box auditing paradigm that inspects the internal state of DP algorithms. By running an instrumented algorithm on neighboring datasets with identical randomness, Re:cord-play directly checks for data-dependent control flow and provides concrete falsification of sensitivity violations by comparing declared sensitivity against the empirically measured distance between internal inputs. We generalize this to Re:cord-play-sample, a full statistical audit that isolates and tests each component, including untrusted ones. We show that our novel testing approach is both effective and necessary by auditing 12 open-source libraries, including SmartNoise SDK, Opacus, and Diffprivlib, and uncovering 13 privacy violations that impact their theoretical guarantees. We release our framework as an open-source Python package, thereby making it easy for DP developers to integrate effective, computationally inexpensive, and seamless privacy testing as part of their software development lifecycle.
Related papers
- IMMACULATE: A Practical LLM Auditing Framework via Verifiable Computation [49.796717294455796]
We present IMMACULATE, a practical auditing framework that detects economically motivated deviations.<n>IMMACULATE selectively audits a small fraction of requests using verifiable computation, achieving strong detection guarantees while amortizing cryptographic overhead.
arXiv Detail & Related papers (2026-02-26T07:21:02Z) - Sequential Auditing for f-Differential Privacy [5.7992233755396505]
We present new auditors to assess Differential Privacy (DP) of an algorithm based on output samples.<n>We shift the focus to the highly expressive privacy concept of $f$-DP, in which the entire privacy behavior is captured by a single tradeoff curve.
arXiv Detail & Related papers (2026-02-06T09:22:24Z) - Tight and Practical Privacy Auditing for Differentially Private In-Context Learning [11.394805414546903]
Large language models (LLMs) perform in-context learning (ICL) by adapting to tasks from prompt demonstrations, which in practice often contain private or proprietary data.<n>We present a tight and efficient privacy auditing framework for DP-ICL systems that runs membership inference attacks and translates their success rates into empirical privacy guarantees using Gaussian DP.<n> Experiments on standard text classification and generation benchmarks show that our empirical leakage estimates closely match theoretical DP budgets on classification tasks and are consistently lower on generation tasks due to conservative embedding-sensitivity bounds.
arXiv Detail & Related papers (2025-11-17T15:39:54Z) - Reconstructing Trust Embeddings from Siamese Trust Scores: A Direct-Sum Approach with Fixed-Point Semantics [0.0]
We study the inverse problem of reconstructing high-dimensional trust embeddings from the one-dimensional Siamese trust scores that many distributed-security frameworks expose.<n>A suite of synthetic benchmarks confirms that, even in the presence of Gaussian noise, the recovered embeddings preserve inter-device geometry as measured by Euclidean and cosine metrics.<n>The paper demonstrates a practical privacy risk: publishing granular trust scores can leak latent behavioural information about both devices and evaluation models.
arXiv Detail & Related papers (2025-08-02T20:19:22Z) - Benchmarking Fraud Detectors on Private Graph Data [70.4654745317714]
Currently, many types of fraud are managed in part by automated detection algorithms that operate over graphs.<n>We consider the scenario where a data holder wishes to outsource development of fraud detectors to third parties.<n>Third parties submit their fraud detectors to the data holder, who evaluates these algorithms on a private dataset and then publicly communicates the results.<n>We propose a realistic privacy attack on this system that allows an adversary to de-anonymize individuals' data based only on the evaluation results.
arXiv Detail & Related papers (2025-07-30T03:20:15Z) - Privacy Audit as Bits Transmission: (Im)possibilities for Audit by One Run [7.850976675388593]
We introduce a unifying framework for privacy audits based on information-theoretic principles.<n>We demystify the method of privacy audit by one run, identifying the conditions under which single-run audits are feasible or infeasible.
arXiv Detail & Related papers (2025-01-29T16:38:51Z) - "What do you want from theory alone?" Experimenting with Tight Auditing of Differentially Private Synthetic Data Generation [11.900523702759598]
We audit Differentially private synthetic data generation algorithms.
We compute the information leakage via an adversary playing a distinguishing game and running membership inference attacks.
We find that, currently, we do not only need white-box MIAs but also worst-case datasets to tightly estimate the privacy leakage from DP-SDGs.
arXiv Detail & Related papers (2024-05-16T14:23:48Z) - Fact Checking Beyond Training Set [64.88575826304024]
We show that the retriever-reader suffers from performance deterioration when it is trained on labeled data from one domain and used in another domain.
We propose an adversarial algorithm to make the retriever component robust against distribution shift.
We then construct eight fact checking scenarios from these datasets, and compare our model to a set of strong baseline models.
arXiv Detail & Related papers (2024-03-27T15:15:14Z) - Tight Auditing of Differentially Private Machine Learning [77.38590306275877]
For private machine learning, existing auditing mechanisms are tight.
They only give tight estimates under implausible worst-case assumptions.
We design an improved auditing scheme that yields tight privacy estimates for natural (not adversarially crafted) datasets.
arXiv Detail & Related papers (2023-02-15T21:40:33Z) - Sequential Kernelized Independence Testing [77.237958592189]
We design sequential kernelized independence tests inspired by kernelized dependence measures.<n>We demonstrate the power of our approaches on both simulated and real data.
arXiv Detail & Related papers (2022-12-14T18:08:42Z) - Debugging Differential Privacy: A Case Study for Privacy Auditing [60.87570714269048]
We show that auditing can also be used to find flaws in (purportedly) differentially private schemes.
In this case study, we audit a recent open source implementation of a differentially private deep learning algorithm and find, with 99.99999999% confidence, that the implementation does not satisfy the claimed differential privacy guarantee.
arXiv Detail & Related papers (2022-02-24T17:31:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.