Do Membership Inference Attacks Work on Large Language Models?
- URL: http://arxiv.org/abs/2402.07841v1
- Date: Mon, 12 Feb 2024 17:52:05 GMT
- Title: Do Membership Inference Attacks Work on Large Language Models?
- Authors: Michael Duan, Anshuman Suri, Niloofar Mireshghallah, Sewon Min, Weijia
Shi, Luke Zettlemoyer, Yulia Tsvetkov, Yejin Choi, David Evans, Hannaneh
Hajishirzi
- Abstract summary: Membership inference attacks (MIAs) attempt to predict whether a particular datapoint is a member of a target model's training data.
We perform a large-scale evaluation of MIAs over a suite of language models trained on the Pile, ranging from 160M to 12B parameters.
We find that MIAs barely outperform random guessing for most settings across varying LLM sizes and domains.
- Score: 145.90022632726883
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Membership inference attacks (MIAs) attempt to predict whether a particular
datapoint is a member of a target model's training data. Despite extensive
research on traditional machine learning models, there has been limited work
studying MIA on the pre-training data of large language models (LLMs). We
perform a large-scale evaluation of MIAs over a suite of language models (LMs)
trained on the Pile, ranging from 160M to 12B parameters. We find that MIAs
barely outperform random guessing for most settings across varying LLM sizes
and domains. Our further analyses reveal that this poor performance can be
attributed to (1) the combination of a large dataset and few training
iterations, and (2) an inherently fuzzy boundary between members and
non-members. We identify specific settings where LLMs have been shown to be
vulnerable to membership inference and show that the apparent success in such
settings can be attributed to a distribution shift, such as when members and
non-members are drawn from the seemingly identical domain but with different
temporal ranges. We release our code and data as a unified benchmark package
that includes all existing MIAs, supporting future work.
Related papers
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z) - LLM-Select: Feature Selection with Large Language Models [64.5099482021597]
Large language models (LLMs) are capable of selecting the most predictive features, with performance rivaling the standard tools of data science.
Our findings suggest that LLMs may be useful not only for selecting the best features for training but also for deciding which features to collect in the first place.
arXiv Detail & Related papers (2024-07-02T22:23:40Z) - Inherent Challenges of Post-Hoc Membership Inference for Large Language Models [17.993892458845124]
Large Language Models (LLMs) are often trained on vast amounts of undisclosed data, motivating the development of post-hoc Membership Inference Attacks (MIAs)
We identify inherent challenges in post-hoc MIA evaluation due to potential distribution shifts between collected member and non-member datasets.
We propose a Regression Discontinuity Design (RDD) approach for post-hoc data collection, which substantially mitigates distribution shifts.
arXiv Detail & Related papers (2024-06-25T23:12:07Z) - ReCaLL: Membership Inference via Relative Conditional Log-Likelihoods [56.073335779595475]
We propose ReCaLL (Relative Conditional Log-Likelihood), a novel membership inference attack (MIA)
ReCaLL examines the relative change in conditional log-likelihoods when prefixing target data points with non-member context.
We conduct comprehensive experiments and show that ReCaLL achieves state-of-the-art performance on the WikiMIA dataset.
arXiv Detail & Related papers (2024-06-23T00:23:13Z) - LLM Dataset Inference: Did you train on my dataset? [42.97830562143777]
We propose a new dataset inference method to accurately identify the datasets used to train large language models.
Our approach successfully distinguishes the train and test sets of different subsets of the Pile with statistically significant p-values 0.1, without any false positives.
arXiv Detail & Related papers (2024-06-10T16:34:43Z) - Pandora's White-Box: Precise Training Data Detection and Extraction in Large Language Models [4.081098869497239]
We develop state-of-the-art privacy attacks against Large Language Models (LLMs)
New membership inference attacks (MIAs) against pretrained LLMs perform hundreds of times better than baseline attacks.
In fine-tuning, we find that a simple attack based on the ratio of the loss between the base and fine-tuned models is able to achieve near-perfect MIA performance.
arXiv Detail & Related papers (2024-02-26T20:41:50Z) - MIA-BAD: An Approach for Enhancing Membership Inference Attack and its
Mitigation with Federated Learning [6.510488168434277]
The membership inference attack (MIA) is a popular paradigm for compromising the privacy of a machine learning (ML) model.
We propose an enhanced Membership Inference Attack with the Batch-wise generated Attack dataset (MIA-BAD)
We show how training an ML model through FL, has some distinct advantages and investigate how the threat introduced with the proposed MIA-BAD approach can be mitigated with FL approaches.
arXiv Detail & Related papers (2023-11-28T06:51:26Z) - Practical Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration [32.15773300068426]
Membership Inference Attacks (MIAs) aim to infer whether a target data record has been utilized for model training or not.
We propose a Membership Inference Attack based on Self-calibrated Probabilistic Variation (SPV-MIA)
Specifically, since memorization in LLMs is inevitable during the training process and occurs before overfitting, we introduce a more reliable membership signal.
arXiv Detail & Related papers (2023-11-10T13:55:05Z) - Assessing Privacy Risks in Language Models: A Case Study on
Summarization Tasks [65.21536453075275]
We focus on the summarization task and investigate the membership inference (MI) attack.
We exploit text similarity and the model's resistance to document modifications as potential MI signals.
We discuss several safeguards for training summarization models to protect against MI attacks and discuss the inherent trade-off between privacy and utility.
arXiv Detail & Related papers (2023-10-20T05:44:39Z) - Large Language Models Are Latent Variable Models: Explaining and Finding
Good Demonstrations for In-Context Learning [104.58874584354787]
In recent years, pre-trained large language models (LLMs) have demonstrated remarkable efficiency in achieving an inference-time few-shot learning capability known as in-context learning.
This study aims to examine the in-context learning phenomenon through a Bayesian lens, viewing real-world LLMs as latent variable models.
arXiv Detail & Related papers (2023-01-27T18:59:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.