Detecting Pretraining Data from Large Language Models
- URL: http://arxiv.org/abs/2310.16789v3
- Date: Sat, 9 Mar 2024 22:26:06 GMT
- Title: Detecting Pretraining Data from Large Language Models
- Authors: Weijia Shi, Anirudh Ajith, Mengzhou Xia, Yangsibo Huang, Daogao Liu,
Terra Blevins, Danqi Chen, Luke Zettlemoyer
- Abstract summary: We study the pretraining data detection problem.
Given a piece of text and black-box access to an LLM without knowing the pretraining data, can we determine if the model was trained on the provided text?
We introduce a new detection method Min-K% Prob based on a simple hypothesis.
- Score: 90.12037980837738
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Although large language models (LLMs) are widely deployed, the data used to
train them is rarely disclosed. Given the incredible scale of this data, up to
trillions of tokens, it is all but certain that it includes potentially
problematic text such as copyrighted materials, personally identifiable
information, and test data for widely reported reference benchmarks. However,
we currently have no way to know which data of these types is included or in
what proportions. In this paper, we study the pretraining data detection
problem: given a piece of text and black-box access to an LLM without knowing
the pretraining data, can we determine if the model was trained on the provided
text? To facilitate this study, we introduce a dynamic benchmark WIKIMIA that
uses data created before and after model training to support gold truth
detection. We also introduce a new detection method Min-K% Prob based on a
simple hypothesis: an unseen example is likely to contain a few outlier words
with low probabilities under the LLM, while a seen example is less likely to
have words with such low probabilities. Min-K% Prob can be applied without any
knowledge about the pretraining corpus or any additional training, departing
from previous detection methods that require training a reference model on data
that is similar to the pretraining data. Moreover, our experiments demonstrate
that Min-K% Prob achieves a 7.4% improvement on WIKIMIA over these previous
methods. We apply Min-K% Prob to three real-world scenarios, copyrighted book
detection, contaminated downstream example detection and privacy auditing of
machine unlearning, and find it a consistently effective solution.
Related papers
- Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method [108.56493934296687]
We introduce a divergence-based calibration method, inspired by the divergence-from-randomness concept, to calibrate token probabilities for pretraining data detection.
We have developed a Chinese-language benchmark, PatentMIA, to assess the performance of detection approaches for LLMs on Chinese text.
arXiv Detail & Related papers (2024-09-23T07:55:35Z) - Training on the Benchmark Is Not All You Need [52.01920740114261]
We propose a simple and effective data leakage detection method based on the contents of multiple-choice options.
Our method is able to work under black-box conditions without access to model training data or weights.
We evaluate the degree of data leakage of 31 mainstream open-source LLMs on four benchmark datasets.
arXiv Detail & Related papers (2024-09-03T11:09:44Z) - Adaptive Pre-training Data Detection for Large Language Models via Surprising Tokens [1.2549198550400134]
Large language models (LLMs) are extensively used, but there are concerns regarding privacy, security, and copyright due to their opaque training data.
Current solutions to this problem leverage techniques explored in machine learning privacy such as Membership Inference Attacks (MIAs)
We propose an adaptive pre-training data detection method which alleviates this reliance and effectively amplify the identification.
arXiv Detail & Related papers (2024-07-30T23:43:59Z) - Probing Language Models for Pre-training Data Detection [11.37731401086372]
We propose to utilize the probing technique for pre-training data detection by examining the model's internal activations.
Our method is simple and effective and leads to more trustworthy pre-training data detection.
arXiv Detail & Related papers (2024-06-03T13:58:04Z) - Min-K%++: Improved Baseline for Detecting Pre-Training Data from Large Language Models [15.50128790503447]
We propose a novel and theoretically motivated methodology for pre-training data detection, named Min-K%++.
Specifically, we present a key insight that training samples tend to be local maxima of the modeled distribution along each input dimension through likelihood training.
arXiv Detail & Related papers (2024-04-03T04:25:01Z) - Membership Inference Attacks against Synthetic Data through Overfitting
Detection [84.02632160692995]
We argue for a realistic MIA setting that assumes the attacker has some knowledge of the underlying data distribution.
We propose DOMIAS, a density-based MIA model that aims to infer membership by targeting local overfitting of the generative model.
arXiv Detail & Related papers (2023-02-24T11:27:39Z) - Learning to Unlearn: Instance-wise Unlearning for Pre-trained
Classifiers [71.70205894168039]
We consider instance-wise unlearning, of which the goal is to delete information on a set of instances from a pre-trained model.
We propose two methods that reduce forgetting on the remaining data: 1) utilizing adversarial examples to overcome forgetting at the representation-level and 2) leveraging weight importance metrics to pinpoint network parameters guilty of propagating unwanted information.
arXiv Detail & Related papers (2023-01-27T07:53:50Z) - Continual Learning for Fake Audio Detection [62.54860236190694]
This paper proposes detecting fake without forgetting, a continual-learning-based method, to make the model learn new spoofing attacks incrementally.
Experiments are conducted on the ASVspoof 2019 dataset.
arXiv Detail & Related papers (2021-04-15T07:57:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.