Related papers: The Sample Complexity of Membership Inference and Privacy Auditing

The Sample Complexity of Membership Inference and Privacy Auditing

URL: http://arxiv.org/abs/2508.19458v1
Date: Tue, 26 Aug 2025 22:19:28 GMT
Title: The Sample Complexity of Membership Inference and Privacy Auditing
Authors: Mahdi Haghifam, Adam Smith, Jonathan Ullman,
Abstract summary: A membership-inference attack gets the output of a learning algorithm, and a target individual, and tries to determine whether this individual is a member of the training data or an independent sample from the same distribution.<n>A successful membership-inference attack typically requires the attacker to have some knowledge about the distribution that the training data was sampled from.<n>We show that for membership inference in this setting, $Omega(n + n2 rho2)$ samples can be necessary to carry out any attack that competes with a fully informed attacker.
Score: 7.865646654746587
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A membership-inference attack gets the output of a learning algorithm, and a target individual, and tries to determine whether this individual is a member of the training data or an independent sample from the same distribution. A successful membership-inference attack typically requires the attacker to have some knowledge about the distribution that the training data was sampled from, and this knowledge is often captured through a set of independent reference samples from that distribution. In this work we study how much information the attacker needs for membership inference by investigating the sample complexity-the minimum number of reference samples required-for a successful attack. We study this question in the fundamental setting of Gaussian mean estimation where the learning algorithm is given $n$ samples from a Gaussian distribution $\mathcal{N}(\mu,\Sigma)$ in $d$ dimensions, and tries to estimate $\hat\mu$ up to some error $\mathbb{E}[\|\hat \mu - \mu\|^2_{\Sigma}]\leq \rho^2 d$. Our result shows that for membership inference in this setting, $\Omega(n + n^2 \rho^2)$ samples can be necessary to carry out any attack that competes with a fully informed attacker. Our result is the first to show that the attacker sometimes needs many more samples than the training algorithm uses to train the model. This result has significant implications for practice, as all attacks used in practice have a restricted form that uses $O(n)$ samples and cannot benefit from $\omega(n)$ samples. Thus, these attacks may be underestimating the possibility of membership inference, and better attacks may be possible when information about the distribution is easy to obtain.

Related papers

Differentially Private Multi-Sampling from Distributions [4.292685318253575]
We study the sample complexity of DP emphsingle-sampling i.e., the minimum number of samples needed to perform this task.<n>We define two variants of emphmulti-sampling, where the goal is to privately approximate $m>1$ samples.
arXiv Detail & Related papers (2024-12-13T19:14:05Z)
Testing with Non-identically Distributed Samples [20.74768558932617]
We examine the extent to which sublinear-sample property testing and estimation applies to settings where samples are independently but not identically distributed. Even with $Theta(k/varepsilon2)$ samples from each distribution, $textbfp_mathrmavg$ is sufficient to learn $textbfp_mathrmavg$ to within error $varepsilon$ in TV distance.
arXiv Detail & Related papers (2023-11-19T01:25:50Z)
Sample Complexity of Robust Learning against Evasion Attacks [3.8888996044605855]
We study the feasibility of adversarially robust learning from the perspective of learning theory, considering sample complexity. We show that, under the uniform distribution, the exponential dependence on the adversary's budget to robustly learn conjunctions remains inevitable. We show that if the query radius is equal to the adversary's budget, we can develop robust empirical risk algorithms in the distribution-free setting.
arXiv Detail & Related papers (2023-08-23T10:51:33Z)
Stochastic Approximation Approaches to Group Distributionally Robust Optimization and Beyond [89.72693227960274]
This paper investigates group distributionally robust optimization (GDRO) with the goal of learning a model that performs well over $m$ different distributions. To reduce the number of samples in each round from $m$ to 1, we cast GDRO as a two-player game, where one player conducts and the other executes an online algorithm for non-oblivious multi-armed bandits. In the second scenario, we propose to optimize the average top-$k$ risk instead of the maximum risk, thereby mitigating the impact of distributions.
arXiv Detail & Related papers (2023-02-18T09:24:15Z)
On-Demand Sampling: Learning Optimally from Multiple Distributions [63.20009081099896]
Social and real-world considerations have given rise to multi-distribution learning paradigms. We establish the optimal sample complexity of these learning paradigms and give algorithms that meet this sample complexity. Our algorithm design and analysis are enabled by our extensions of online learning techniques for solving zero-sum games.
arXiv Detail & Related papers (2022-10-22T19:07:26Z)
Optimal Membership Inference Bounds for Adaptive Composition of Sampled Gaussian Mechanisms [93.44378960676897]
Given a trained model and a data sample, membership-inference (MI) attacks predict whether the sample was in the model's training set. A common countermeasure against MI attacks is to utilize differential privacy (DP) during model training to mask the presence of individual examples. In this paper, we derive bounds for the textitadvantage of an adversary mounting a MI attack, and demonstrate tightness for the widely-used Gaussian mechanism.
arXiv Detail & Related papers (2022-04-12T22:36:56Z)
On the Statistical Complexity of Sample Amplification [26.78883878113411]
We show that it is possible to produce a larger set of $n+m$ i.i.d. samples which cannot be distinguished from $n+m$ i.i.d. samples drawn from $P$. Our techniques apply to a large class of distributions including the exponential family, and establish a rigorous connection between sample amplification and distribution learning.
arXiv Detail & Related papers (2022-01-12T05:45:27Z)
The Sample Complexity of Robust Covariance Testing [56.98280399449707]
We are given i.i.d. samples from a distribution of the form $Z = (1-epsilon) X + epsilon B$, where $X$ is a zero-mean and unknown covariance Gaussian $mathcalN(0, Sigma)$. In the absence of contamination, prior work gave a simple tester for this hypothesis testing task that uses $O(d)$ samples. We prove a sample complexity lower bound of $Omega(d2)$ for $epsilon$ an arbitrarily small constant and $gamma
arXiv Detail & Related papers (2020-12-31T18:24:41Z)
A Provably Efficient Sample Collection Strategy for Reinforcement Learning [123.69175280309226]
One of the challenges in online reinforcement learning (RL) is that the agent needs to trade off the exploration of the environment and the exploitation of the samples to optimize its behavior. We propose to tackle the exploration-exploitation problem following a decoupled approach composed of: 1) An "objective-specific" algorithm that prescribes how many samples to collect at which states, as if it has access to a generative model (i.e., sparse simulator of the environment); 2) An "objective-agnostic" sample collection responsible for generating the prescribed samples as fast as possible.
arXiv Detail & Related papers (2020-07-13T15:17:35Z)
Locally Private Hypothesis Selection [96.06118559817057]
We output a distribution from $mathcalQ$ whose total variation distance to $p$ is comparable to the best such distribution. We show that the constraint of local differential privacy incurs an exponential increase in cost. Our algorithms result in exponential improvements on the round complexity of previous methods.
arXiv Detail & Related papers (2020-02-21T18:30:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.