How Much Does Each Datapoint Leak Your Privacy? Quantifying the
Per-datum Membership Leakage
- URL: http://arxiv.org/abs/2402.10065v1
- Date: Thu, 15 Feb 2024 16:30:55 GMT
- Title: How Much Does Each Datapoint Leak Your Privacy? Quantifying the
Per-datum Membership Leakage
- Authors: Achraf Azize, Debabrota Basu
- Abstract summary: We study the per-datum Membership Inference Attacks (MIAs), where an attacker aims to infer whether a fixed target datum has been included in the input dataset of an algorithm and thus, violates privacy.
We quantify the per-datum membership leakage for the empirical mean, and show that it depends on the Mahalanobis distance between the target datum and the data-generating distribution.
Our experiments demonstrate the impacts of the leakage score, the sub-sampling ratio and the noise scale on the per-datum membership leakage as indicated by the theory.
- Score: 13.097161185372153
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the per-datum Membership Inference Attacks (MIAs), where an attacker
aims to infer whether a fixed target datum has been included in the input
dataset of an algorithm and thus, violates privacy. First, we define the
membership leakage of a datum as the advantage of the optimal adversary
targeting to identify it. Then, we quantify the per-datum membership leakage
for the empirical mean, and show that it depends on the Mahalanobis distance
between the target datum and the data-generating distribution. We further
assess the effect of two privacy defences, i.e. adding Gaussian noise and
sub-sampling. We quantify exactly how both of them decrease the per-datum
membership leakage. Our analysis builds on a novel proof technique that
combines an Edgeworth expansion of the likelihood ratio test and a
Lindeberg-Feller central limit theorem. Our analysis connects the existing
likelihood ratio and scalar product attacks, and also justifies different
canary selection strategies used in the privacy auditing literature. Finally,
our experiments demonstrate the impacts of the leakage score, the sub-sampling
ratio and the noise scale on the per-datum membership leakage as indicated by
the theory.
Related papers
- Bayes-Nash Generative Privacy Protection Against Membership Inference Attacks [24.330984323956173]
We propose a game model for privacy-preserving publishing of data-sharing mechanism outputs.
We introduce the notions of Bayes-Nash generative privacy (BNGP) and Bayes generative privacy (BGP) risk.
We apply our method to sharing summary statistics, where MIAs can re-identify individuals even from aggregated data.
arXiv Detail & Related papers (2024-10-09T20:29:04Z) - Membership inference attack with relative decision boundary distance [9.764492069791991]
Membership inference attack is one of the most popular privacy attacks in machine learning.
We propose a new attack method, called muti-class adaptive membership inference attack in the label-only setting.
arXiv Detail & Related papers (2023-06-07T02:29:58Z) - Theoretically Principled Federated Learning for Balancing Privacy and
Utility [61.03993520243198]
We propose a general learning framework for the protection mechanisms that protects privacy via distorting model parameters.
It can achieve personalized utility-privacy trade-off for each model parameter, on each client, at each communication round in federated learning.
arXiv Detail & Related papers (2023-05-24T13:44:02Z) - Enabling Trade-offs in Privacy and Utility in Genomic Data Beacons and
Summary Statistics [26.99521354120141]
We introduce optimization-based approaches to explicitly trade off the utility of summary data or Beacon responses and privacy.
In the first, an attacker applies a likelihood-ratio test to make membership-inference claims.
In the second, an attacker uses a threshold that accounts for the effect of the data release on the separation in scores between individuals.
arXiv Detail & Related papers (2023-01-11T19:16:13Z) - Purifier: Defending Data Inference Attacks via Transforming Confidence
Scores [27.330482508047428]
We propose a method, namely PURIFIER, to defend against membership inference attacks.
Experiments show that PURIFIER helps defend membership inference attacks with high effectiveness and efficiency.
PURIFIER is also effective in defending adversarial model inversion attacks and attribute inference attacks.
arXiv Detail & Related papers (2022-12-01T16:09:50Z) - Optimal Membership Inference Bounds for Adaptive Composition of Sampled
Gaussian Mechanisms [93.44378960676897]
Given a trained model and a data sample, membership-inference (MI) attacks predict whether the sample was in the model's training set.
A common countermeasure against MI attacks is to utilize differential privacy (DP) during model training to mask the presence of individual examples.
In this paper, we derive bounds for the textitadvantage of an adversary mounting a MI attack, and demonstrate tightness for the widely-used Gaussian mechanism.
arXiv Detail & Related papers (2022-04-12T22:36:56Z) - Partial Identification with Noisy Covariates: A Robust Optimization
Approach [94.10051154390237]
Causal inference from observational datasets often relies on measuring and adjusting for covariates.
We show that this robust optimization approach can extend a wide range of causal adjustment methods to perform partial identification.
Across synthetic and real datasets, we find that this approach provides ATE bounds with a higher coverage probability than existing methods.
arXiv Detail & Related papers (2022-02-22T04:24:26Z) - On the Practicality of Differential Privacy in Federated Learning by
Tuning Iteration Times [51.61278695776151]
Federated Learning (FL) is well known for its privacy protection when training machine learning models among distributed clients collaboratively.
Recent studies have pointed out that the naive FL is susceptible to gradient leakage attacks.
Differential Privacy (DP) emerges as a promising countermeasure to defend against gradient leakage attacks.
arXiv Detail & Related papers (2021-01-11T19:43:12Z) - Sparse Feature Selection Makes Batch Reinforcement Learning More Sample
Efficient [62.24615324523435]
This paper provides a statistical analysis of high-dimensional batch Reinforcement Learning (RL) using sparse linear function approximation.
When there is a large number of candidate features, our result sheds light on the fact that sparsity-aware methods can make batch RL more sample efficient.
arXiv Detail & Related papers (2020-11-08T16:48:02Z) - On Primes, Log-Loss Scores and (No) Privacy [8.679020335206753]
In this paper, we prove that this additional information enables the adversary to infer the membership of any number of datapoints with full accuracy in a single query.
Our approach obviates any attack model training or access to side knowledge with the adversary.
arXiv Detail & Related papers (2020-09-17T23:35:12Z) - Sampling Attacks: Amplification of Membership Inference Attacks by
Repeated Queries [74.59376038272661]
We introduce sampling attack, a novel membership inference technique that unlike other standard membership adversaries is able to work under severe restriction of no access to scores of the victim model.
We show that a victim model that only publishes the labels is still susceptible to sampling attacks and the adversary can recover up to 100% of its performance.
For defense, we choose differential privacy in the form of gradient perturbation during the training of the victim model as well as output perturbation at prediction time.
arXiv Detail & Related papers (2020-09-01T12:54:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.