Related papers: Blind Baselines Beat Membership Inference Attacks for Foundation Models

Blind Baselines Beat Membership Inference Attacks for Foundation Models

URL: http://arxiv.org/abs/2406.16201v1
Date: Sun, 23 Jun 2024 19:40:11 GMT
Title: Blind Baselines Beat Membership Inference Attacks for Foundation Models
Authors: Debeshee Das, Jie Zhang, Florian Tramèr,
Abstract summary: Membership inference (MI) attacks try to determine if a data sample was used to train a machine learning model. For foundation models trained on unknown Web data, MI attacks can be used to detect copyrighted training materials, measure test set contamination, or audit machine unlearning. We show that evaluations of MI attacks for foundation models are flawed, because they sample members and non-members from different distributions.
Score: 24.010279957557252
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Membership inference (MI) attacks try to determine if a data sample was used to train a machine learning model. For foundation models trained on unknown Web data, MI attacks can be used to detect copyrighted training materials, measure test set contamination, or audit machine unlearning. Unfortunately, we find that evaluations of MI attacks for foundation models are flawed, because they sample members and non-members from different distributions. For 8 published MI evaluation datasets, we show that blind attacks -- that distinguish the member and non-member distributions without looking at any trained model -- outperform state-of-the-art MI attacks. Existing evaluations thus tell us nothing about membership leakage of a foundation model's training data.

Related papers

Membership Inference Attack Should Move On to Distributional Statistics for Distilled Generative Models [31.834967019893227]
Membership inference attacks (MIAs) determine whether certain data instances were used to train a model. This paper reveals an oversight in existing MIAs against emphdistilled generative models We introduce a emphset-based MIA framework that measures emphrelative distributional discrepancies between student-generated dataemphsets and potential member/non-member dataemphsets
arXiv Detail & Related papers (2025-02-05T08:11:23Z)
Detecting Training Data of Large Language Models via Expectation Maximization [62.28028046993391]
We introduce EM-MIA, a novel membership inference method that iteratively refines membership scores and prefix scores via an expectation-maximization algorithm. EM-MIA achieves state-of-the-art results on WikiMIA.
arXiv Detail & Related papers (2024-10-10T03:31:16Z)
Assessing Privacy Risks in Language Models: A Case Study on Summarization Tasks [65.21536453075275]
We focus on the summarization task and investigate the membership inference (MI) attack. We exploit text similarity and the model's resistance to document modifications as potential MI signals. We discuss several safeguards for training summarization models to protect against MI attacks and discuss the inherent trade-off between privacy and utility.
arXiv Detail & Related papers (2023-10-20T05:44:39Z)
Beyond Labeling Oracles: What does it mean to steal ML models? [52.63413852460003]
Model extraction attacks are designed to steal trained models with only query access. We investigate factors influencing the success of model extraction attacks. Our findings urge the community to redefine the adversarial goals of ME attacks.
arXiv Detail & Related papers (2023-10-03T11:10:21Z)
Can Membership Inferencing be Refuted? [31.31060116447964]
We study the reliability of membership inference attacks in practice. We show that a model owner can plausibly refute the result of a membership inference test on a data point $x$ by constructing a proof of repudiation. Our results call for a re-evaluation of the implications of membership inference attacks in practice.
arXiv Detail & Related papers (2023-03-07T04:36:35Z)
Membership Inference Attacks against Synthetic Data through Overfitting Detection [84.02632160692995]
We argue for a realistic MIA setting that assumes the attacker has some knowledge of the underlying data distribution. We propose DOMIAS, a density-based MIA model that aims to infer membership by targeting local overfitting of the generative model.
arXiv Detail & Related papers (2023-02-24T11:27:39Z)
On the Discredibility of Membership Inference Attacks [11.172550334631921]
Membership inference attacks are proposed to determine if a sample was part of the training set or not. We show that MI models frequently misclassify neighboring nonmember samples of a member sample as members. We argue that current membership inference attacks can identify memorized subpopulations, but they cannot reliably identify which exact sample in the subpopulation was used during the training.
arXiv Detail & Related papers (2022-12-06T01:48:27Z)
RelaxLoss: Defending Membership Inference Attacks without Losing Utility [68.48117818874155]
We propose a novel training framework based on a relaxed loss with a more achievable learning target. RelaxLoss is applicable to any classification model with added benefits of easy implementation and negligible overhead. Our approach consistently outperforms state-of-the-art defense mechanisms in terms of resilience against MIAs.
arXiv Detail & Related papers (2022-07-12T19:34:47Z)
Investigating Membership Inference Attacks under Data Dependencies [26.70764798408236]
Training machine learning models on privacy-sensitive data has opened the door to new attacks that can have serious privacy implications. One such attack, the Membership Inference Attack (MIA), exposes whether or not a particular data point was used to train a model. We evaluate the defence under the restrictive assumption that all members of the training set, as well as non-members, are independent and identically distributed.
arXiv Detail & Related papers (2020-10-23T00:16:46Z)
Knowledge-Enriched Distributional Model Inversion Attacks [49.43828150561947]
Model inversion (MI) attacks are aimed at reconstructing training data from model parameters. We present a novel inversion-specific GAN that can better distill knowledge useful for performing attacks on private models from public data. Our experiments show that the combination of these techniques can significantly boost the success rate of the state-of-the-art MI attacks by 150%.
arXiv Detail & Related papers (2020-10-08T16:20:48Z)
Sampling Attacks: Amplification of Membership Inference Attacks by Repeated Queries [74.59376038272661]
We introduce sampling attack, a novel membership inference technique that unlike other standard membership adversaries is able to work under severe restriction of no access to scores of the victim model. We show that a victim model that only publishes the labels is still susceptible to sampling attacks and the adversary can recover up to 100% of its performance. For defense, we choose differential privacy in the form of gradient perturbation during the training of the victim model as well as output perturbation at prediction time.
arXiv Detail & Related papers (2020-09-01T12:54:54Z)
On the Difficulty of Membership Inference Attacks [11.172550334631921]
Recent studies propose membership inference (MI) attacks on deep models. Despite their apparent success, these studies only report accuracy, precision, and recall of the positive class (member class) We show that the way the MI attack performance has been reported is often misleading because they suffer from high false positive rate or false alarm rate (FAR) that has not been reported.
arXiv Detail & Related papers (2020-05-27T23:09:17Z)
Membership Inference Attacks and Defenses in Classification Models [19.498313593713043]
We study the membership inference (MI) attack against classifiers. We find that a model's vulnerability to MI attacks is tightly related to the generalization gap. We propose a defense against MI attacks that aims to close the gap by intentionally reducing the training accuracy.
arXiv Detail & Related papers (2020-02-27T12:35:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.