LLM Dataset Inference: Did you train on my dataset?
- URL: http://arxiv.org/abs/2406.06443v1
- Date: Mon, 10 Jun 2024 16:34:43 GMT
- Title: LLM Dataset Inference: Did you train on my dataset?
- Authors: Pratyush Maini, Hengrui Jia, Nicolas Papernot, Adam Dziedzic,
- Abstract summary: We propose a new dataset inference method to accurately identify the datasets used to train large language models.
Our approach successfully distinguishes the train and test sets of different subsets of the Pile with statistically significant p-values 0.1, without any false positives.
- Score: 42.97830562143777
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The proliferation of large language models (LLMs) in the real world has come with a rise in copyright cases against companies for training their models on unlicensed data from the internet. Recent works have presented methods to identify if individual text sequences were members of the model's training data, known as membership inference attacks (MIAs). We demonstrate that the apparent success of these MIAs is confounded by selecting non-members (text sequences not used for training) belonging to a different distribution from the members (e.g., temporally shifted recent Wikipedia articles compared with ones used to train the model). This distribution shift makes membership inference appear successful. However, most MIA methods perform no better than random guessing when discriminating between members and non-members from the same distribution (e.g., in this case, the same period of time). Even when MIAs work, we find that different MIAs succeed at inferring membership of samples from different distributions. Instead, we propose a new dataset inference method to accurately identify the datasets used to train large language models. This paradigm sits realistically in the modern-day copyright landscape, where authors claim that an LLM is trained over multiple documents (such as a book) written by them, rather than one particular paragraph. While dataset inference shares many of the challenges of membership inference, we solve it by selectively combining the MIAs that provide positive signal for a given distribution, and aggregating them to perform a statistical test on a given dataset. Our approach successfully distinguishes the train and test sets of different subsets of the Pile with statistically significant p-values < 0.1, without any false positives.
Related papers
- Scaling Up Membership Inference: When and How Attacks Succeed on Large Language Models [37.420266437306374]
Membership inference attacks (MIA) attempt to verify the membership of a given data sample in the training set for a model.
Recent research has largely concluded that current MIA methods do not work on large language models (LLM)
arXiv Detail & Related papers (2024-10-31T18:59:46Z) - Self-Comparison for Dataset-Level Membership Inference in Large (Vision-)Language Models [73.94175015918059]
We propose a dataset-level membership inference method based on Self-Comparison.
Our method does not require access to ground-truth member data or non-member data in identical distribution.
arXiv Detail & Related papers (2024-10-16T23:05:59Z) - Training on the Benchmark Is Not All You Need [52.01920740114261]
We propose a simple and effective data leakage detection method based on the contents of multiple-choice options.
Our method is able to work under black-box conditions without access to model training data or weights.
We evaluate the degree of data leakage of 31 mainstream open-source LLMs on four benchmark datasets.
arXiv Detail & Related papers (2024-09-03T11:09:44Z) - Nob-MIAs: Non-biased Membership Inference Attacks Assessment on Large Language Models with Ex-Post Dataset Construction [37.69303106863453]
Membership Inference Attacks (MIAs) aim to detect whether specific documents were used in a given Large Language Models (LLMs) pretraining.
This paper addresses the evaluation of MIAs on LLMs with partially inferable training sets.
We propose and validate algorithms to create non-biased'' and non-classifiable'' datasets for fairer MIA assessment.
arXiv Detail & Related papers (2024-08-12T07:49:28Z) - SoK: Membership Inference Attacks on LLMs are Rushing Nowhere (and How to Fix It) [16.673210422615348]
More than 10 new methods have been proposed to perform Membership Inference Attacks (MIAs) against LLMs.
Contrary to traditional MIAs which rely on fixed -- but randomized -- records or models, these methods are mostly evaluated on datasets collected post-hoc.
This lack of randomization raises concerns of a distribution shift between members and non-members.
arXiv Detail & Related papers (2024-06-25T23:12:07Z) - Do Membership Inference Attacks Work on Large Language Models? [141.2019867466968]
Membership inference attacks (MIAs) attempt to predict whether a particular datapoint is a member of a target model's training data.
We perform a large-scale evaluation of MIAs over a suite of language models trained on the Pile, ranging from 160M to 12B parameters.
We find that MIAs barely outperform random guessing for most settings across varying LLM sizes and domains.
arXiv Detail & Related papers (2024-02-12T17:52:05Z) - Membership Inference Attacks against Synthetic Data through Overfitting
Detection [84.02632160692995]
We argue for a realistic MIA setting that assumes the attacker has some knowledge of the underlying data distribution.
We propose DOMIAS, a density-based MIA model that aims to infer membership by targeting local overfitting of the generative model.
arXiv Detail & Related papers (2023-02-24T11:27:39Z) - Canary in a Coalmine: Better Membership Inference with Ensembled
Adversarial Queries [53.222218035435006]
We use adversarial tools to optimize for queries that are discriminative and diverse.
Our improvements achieve significantly more accurate membership inference than existing methods.
arXiv Detail & Related papers (2022-10-19T17:46:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.