Sensitivity, Specificity, and Consistency: A Tripartite Evaluation of Privacy Filters for Synthetic Data Generation
- URL: http://arxiv.org/abs/2510.01793v1
- Date: Thu, 02 Oct 2025 08:32:20 GMT
- Title: Sensitivity, Specificity, and Consistency: A Tripartite Evaluation of Privacy Filters for Synthetic Data Generation
- Authors: Adil Koeken, Alexander Ziller, Moritz Knolle, Daniel Rueckert,
- Abstract summary: Post-hoc privacy filtering techniques have been proposed to remove samples containing personally identifiable information.<n>This work presents a rigorous evaluation of a filtering pipeline applied to chest X-ray synthesis.<n>We conclude that substantial advances in filter design are needed before these methods can be confidently deployed in sensitive applications.
- Score: 57.13635002340272
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The generation of privacy-preserving synthetic datasets is a promising avenue for overcoming data scarcity in medical AI research. Post-hoc privacy filtering techniques, designed to remove samples containing personally identifiable information, have recently been proposed as a solution. However, their effectiveness remains largely unverified. This work presents a rigorous evaluation of a filtering pipeline applied to chest X-ray synthesis. Contrary to claims from the original publications, our results demonstrate that current filters exhibit limited specificity and consistency, achieving high sensitivity only for real images while failing to reliably detect near-duplicates generated from training data. These results demonstrate a critical limitation of post-hoc filtering: rather than effectively safeguarding patient privacy, these methods may provide a false sense of security while leaving unacceptable levels of patient information exposed. We conclude that substantial advances in filter design are needed before these methods can be confidently deployed in sensitive applications.
Related papers
- How to DP-fy Your Data: A Practical Guide to Generating Synthetic Data With Differential Privacy [52.00934156883483]
Differential Privacy (DP) is a framework for reasoning about and limiting information leakage.<n>Differentially Private Synthetic data refers to synthetic data that preserves the overall trends of source data.
arXiv Detail & Related papers (2025-12-02T21:14:39Z) - PRIVET: Privacy Metric Based on Extreme Value Theory [8.447463478355845]
Deep generative models are often trained on sensitive data, such as genetic sequences, health data, or more broadly, any copyrighted, licensed or protected content.<n>This raises critical concerns around privacy-preserving synthetic data, and more specifically around privacy leakage.<n>We propose PRIVET, a generic sample-based, modality-agnostic algorithm that assigns an individual privacy leak score to each synthetic sample.
arXiv Detail & Related papers (2025-10-28T09:42:03Z) - On the MIA Vulnerability Gap Between Private GANs and Diffusion Models [51.53790101362898]
Generative Adversarial Networks (GANs) and diffusion models have emerged as leading approaches for high-quality image synthesis.<n>We present the first unified theoretical and empirical analysis of the privacy risks faced by differentially private generative models.
arXiv Detail & Related papers (2025-09-03T14:18:22Z) - A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage [77.83757117924995]
We propose a new framework that evaluates re-identification attacks to quantify individual privacy risks upon data release.<n>Our approach shows that seemingly innocuous auxiliary information can be used to infer sensitive attributes like age or substance use history from sanitized data.
arXiv Detail & Related papers (2025-04-28T01:16:27Z) - Enhancing the Utility of Privacy-Preserving Cancer Classification using Synthetic Data [5.448470199971472]
Deep learning holds immense promise for aiding radiologists in breast cancer detection.
achieving optimal model performance is hampered by limitations in availability and sharing of data.
Traditional deep learning models can inadvertently leak sensitive training information.
This work addresses these challenges exploring quantifying the utility of privacy-preserving deep learning techniques.
arXiv Detail & Related papers (2024-07-17T15:52:45Z) - Defining 'Good': Evaluation Framework for Synthetic Smart Meter Data [14.779917834583577]
We show that standard privacy attack methods are inadequate for assessing privacy risks of smart meter datasets.
We propose an improved method by injecting training data with implausible outliers, then launching privacy attacks directly on these outliers.
arXiv Detail & Related papers (2024-07-16T14:41:27Z) - TernaryVote: Differentially Private, Communication Efficient, and
Byzantine Resilient Distributed Optimization on Heterogeneous Data [50.797729676285876]
We propose TernaryVote, which combines a ternary compressor and the majority vote mechanism to realize differential privacy, gradient compression, and Byzantine resilience simultaneously.
We theoretically quantify the privacy guarantee through the lens of the emerging f-differential privacy (DP) and the Byzantine resilience of the proposed algorithm.
arXiv Detail & Related papers (2024-02-16T16:41:14Z) - No Free Lunch in "Privacy for Free: How does Dataset Condensation Help
Privacy" [75.98836424725437]
New methods designed to preserve data privacy require careful scrutiny.
Failure to preserve privacy is hard to detect, and yet can lead to catastrophic results when a system implementing a privacy-preserving'' method is attacked.
arXiv Detail & Related papers (2022-09-29T17:50:23Z) - Decouple-and-Sample: Protecting sensitive information in task agnostic
data release [17.398889291769986]
sanitizer is a framework for secure and task-agnostic data release.
We show that a better privacy-utility trade-off is achieved if sensitive information can be synthesized privately.
arXiv Detail & Related papers (2022-03-17T19:15:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.