RAPID: Risk of Attribute Prediction-Induced Disclosure in Synthetic Microdata
- URL: http://arxiv.org/abs/2602.09235v1
- Date: Mon, 09 Feb 2026 22:03:11 GMT
- Title: RAPID: Risk of Attribute Prediction-Induced Disclosure in Synthetic Microdata
- Authors: Matthias Templ, Oscar Thees, Roman Müller,
- Abstract summary: We introduce a disclosure risk measure that directly quantifies inferential vulnerability under a realistic attack model.<n>An adversary trains a predictive model solely on the released synthetic data and applies it to real individuals' quasi-identifiers.<n>For continuous sensitive attributes, we propose a baseline-normalized confidence score that measures how much more confident the attacker is about the true class than would be expected from class prevalence alone.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Statistical data anonymization increasingly relies on fully synthetic microdata, for which classical identity disclosure measures are less informative than an adversary's ability to infer sensitive attributes from released data. We introduce RAPID (Risk of Attribute Prediction--Induced Disclosure), a disclosure risk measure that directly quantifies inferential vulnerability under a realistic attack model. An adversary trains a predictive model solely on the released synthetic data and applies it to real individuals' quasi-identifiers. For continuous sensitive attributes, RAPID reports the proportion of records whose predicted values fall within a specified relative error tolerance. For categorical attributes, we propose a baseline-normalized confidence score that measures how much more confident the attacker is about the true class than would be expected from class prevalence alone, and we summarize risk as the fraction of records exceeding a policy-defined threshold. This construction yields an interpretable, bounded risk metric that is robust to class imbalance, independent of any specific synthesizer, and applicable with arbitrary learning algorithms. We illustrate threshold calibration, uncertainty quantification, and comparative evaluation of synthetic data generators using simulations and real data. Our results show that RAPID provides a practical, attacker-realistic upper bound on attribute-inference disclosure risk that complements existing utility diagnostics and disclosure control frameworks.
Related papers
- On the Generalization and Robustness in Conditional Value-at-Risk [12.253712889424584]
We develop a learning-theoretic analysis of Conditional Value-at-Risk (CVaR)-based empirical risk minimization under heavy-tailed and contaminated data.<n>We establish sharp, high-probability generalization and excess risk bounds under minimal moment assumptions.<n>We show that CVaR decisions themselves can be intrinsically unstable under heavy tails.
arXiv Detail & Related papers (2026-02-20T08:10:11Z) - Generalized Leverage Score for Scalable Assessment of Privacy Vulnerability [6.029433950934382]
We show that exposure to membership inference attack (MIA) is governed by a data point's influence on the learned model.<n>We formalize this in the linear setting by establishing a theoretical correspondence between individual MIA risk and the leverage score.<n>This characterization explains how data-dependent sensitivity translates into exposure, without the computational burden of training shadow models.
arXiv Detail & Related papers (2026-02-17T07:07:31Z) - Quality Degradation Attack in Synthetic Data [5.461072909384133]
This study investigates quality attacks initiated by adversaries who possess access to the real dataset or control over the generation process.<n>We formalize a corresponding threat model and empirically evaluate the effectiveness of targeted manipulations of real data.
arXiv Detail & Related papers (2026-01-06T11:43:31Z) - The Eminence in Shadow: Exploiting Feature Boundary Ambiguity for Robust Backdoor Attacks [51.468144272905135]
Deep neural networks (DNNs) underpin critical applications yet remain vulnerable to backdoor attacks.<n>We provide a theoretical analysis targeting backdoor attacks, focusing on how sparse decision boundaries enable disproportionate model manipulation.<n>We propose Eminence, an explainable and robust black-box backdoor framework with provable theoretical guarantees and inherent stealth properties.
arXiv Detail & Related papers (2025-12-11T08:09:07Z) - Uncertainty-Aware Data-Efficient AI: An Information-Theoretic Perspective [48.073471560778984]
In context-specific applications such as robotics, telecommunications, and healthcare, artificial intelligence systems often face the challenge of limited training data.<n>This review paper examines formal methodologies that address data-limited regimes through two complementary approaches.
arXiv Detail & Related papers (2025-12-04T21:44:22Z) - Privacy Auditing Synthetic Data Release through Local Likelihood Attacks [7.780592134085148]
Gene Likelihood Ratio Attack (Gen-LRA)<n>Gen-LRA formulates its attack by evaluating the influence a test observation has in a surrogate model's estimation of a local likelihood ratio over the synthetic data.<n>Results underscore Gen-LRA's effectiveness as a privacy auditing tool for the release of synthetic data.
arXiv Detail & Related papers (2025-08-28T18:27:40Z) - Noise-Adaptive Conformal Classification with Marginal Coverage [53.74125453366155]
We introduce an adaptive conformal inference method capable of efficiently handling deviations from exchangeability caused by random label noise.<n>We validate our method through extensive numerical experiments demonstrating its effectiveness on synthetic and real data sets.
arXiv Detail & Related papers (2025-01-29T23:55:23Z) - Geometry-Aware Instrumental Variable Regression [56.16884466478886]
We propose a transport-based IV estimator that takes into account the geometry of the data manifold through data-derivative information.
We provide a simple plug-and-play implementation of our method that performs on par with related estimators in standard settings.
arXiv Detail & Related papers (2024-05-19T17:49:33Z) - On the Impact of Uncertainty and Calibration on Likelihood-Ratio Membership Inference Attacks [42.18575921329484]
We analyze the performance of the likelihood ratio attack (LiRA) within an information-theoretical framework.<n>We derive bounds on the advantage of an MIA adversary with the aim of offering insights into the impact of uncertainty and calibration on the effectiveness of MIAs.
arXiv Detail & Related papers (2024-02-16T13:41:18Z) - Risk Minimization from Adaptively Collected Data: Guarantees for
Supervised and Policy Learning [57.88785630755165]
Empirical risk minimization (ERM) is the workhorse of machine learning, but its model-agnostic guarantees can fail when we use adaptively collected data.
We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class.
For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero.
arXiv Detail & Related papers (2021-06-03T09:50:13Z) - Trust but Verify: Assigning Prediction Credibility by Counterfactual
Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning.
These measures should account for the wide variety of models used in practice.
The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.