Proving Test Set Contamination in Black Box Language Models
- URL: http://arxiv.org/abs/2310.17623v2
- Date: Fri, 24 Nov 2023 01:45:16 GMT
- Title: Proving Test Set Contamination in Black Box Language Models
- Authors: Yonatan Oren and Nicole Meister and Niladri Chatterji and Faisal
Ladhak and Tatsunori B. Hashimoto
- Abstract summary: We show that it is possible to provide provable guarantees of test set contamination in language models without access to pretraining data or model weights.
Our approach leverages the fact that when there is no data contamination, all orderings of an exchangeable benchmark should be equally likely.
- Score: 20.576866080360247
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models are trained on vast amounts of internet data, prompting
concerns and speculation that they have memorized public benchmarks. Going from
speculation to proof of contamination is challenging, as the pretraining data
used by proprietary models are often not publicly accessible. We show that it
is possible to provide provable guarantees of test set contamination in
language models without access to pretraining data or model weights. Our
approach leverages the fact that when there is no data contamination, all
orderings of an exchangeable benchmark should be equally likely. In contrast,
the tendency for language models to memorize example order means that a
contaminated language model will find certain canonical orderings to be much
more likely than others. Our test flags potential contamination whenever the
likelihood of a canonically ordered benchmark dataset is significantly higher
than the likelihood after shuffling the examples. We demonstrate that our
procedure is sensitive enough to reliably prove test set contamination in
challenging situations, including models as small as 1.4 billion parameters, on
small test sets of only 1000 examples, and datasets that appear only a few
times in the pretraining corpus. Using our test, we audit five popular publicly
accessible language models for test set contamination and find little evidence
for pervasive contamination.
Related papers
- Training on the Benchmark Is Not All You Need [52.01920740114261]
We propose a simple and effective data leakage detection method based on the contents of multiple-choice options.
Our method is able to work under black-box conditions without access to model training data or weights.
We evaluate the degree of data leakage of 31 mainstream open-source LLMs on four benchmark datasets.
arXiv Detail & Related papers (2024-09-03T11:09:44Z) - PaCoST: Paired Confidence Significance Testing for Benchmark Contamination Detection in Large Language Models [41.772263447213234]
Large language models (LLMs) are known to be trained on vast amounts of data, which may unintentionally or intentionally include data from commonly used benchmarks.
This inclusion can lead to cheatingly high scores on model leaderboards, yet result in disappointing performance in real-world applications.
We introduce PaCoST, a Paired Confidence Significance Testing to effectively detect benchmark contamination in LLMs.
arXiv Detail & Related papers (2024-06-26T13:12:40Z) - GPT-HateCheck: Can LLMs Write Better Functional Tests for Hate Speech Detection? [50.53312866647302]
HateCheck is a suite for testing fine-grained model functionalities on synthesized data.
We propose GPT-HateCheck, a framework to generate more diverse and realistic functional tests from scratch.
Crowd-sourced annotation demonstrates that the generated test cases are of high quality.
arXiv Detail & Related papers (2024-02-23T10:02:01Z) - Evading Data Contamination Detection for Language Models is (too) Easy [9.024665800235855]
Large language models can inadvertently lead to contamination with public benchmarks.
We propose a categorization of both model providers and contamination detection methods.
This reveals vulnerabilities in existing methods that we exploit with EAL.
arXiv Detail & Related papers (2024-02-05T09:10:32Z) - Rethinking Benchmark and Contamination for Language Models with
Rephrased Samples [49.18977581962162]
Large language models are increasingly trained on all the data ever produced by humans.
Many have raised concerns about the trustworthiness of public benchmarks due to potential contamination in pre-training or fine-tuning datasets.
arXiv Detail & Related papers (2023-11-08T17:35:20Z) - Detecting Pretraining Data from Large Language Models [90.12037980837738]
We study the pretraining data detection problem.
Given a piece of text and black-box access to an LLM without knowing the pretraining data, can we determine if the model was trained on the provided text?
We introduce a new detection method Min-K% Prob based on a simple hypothesis.
arXiv Detail & Related papers (2023-10-25T17:21:23Z) - Estimating Contamination via Perplexity: Quantifying Memorisation in
Language Model Evaluation [2.4173424114751114]
We propose a novel method to quantify contamination without the access of the full training set.
Our analysis provides evidence of significant memorisation of recent foundation models in popular reading comprehension, summarisation benchmarks, while multiple choice appears less contaminated.
arXiv Detail & Related papers (2023-09-19T15:02:58Z) - Few-shot Instruction Prompts for Pretrained Language Models to Detect
Social Biases [55.45617404586874]
We propose a few-shot instruction-based method for prompting pre-trained language models (LMs)
We show that large LMs can detect different types of fine-grained biases with similar and sometimes superior accuracy to fine-tuned models.
arXiv Detail & Related papers (2021-12-15T04:19:52Z) - Limits of Detecting Text Generated by Large-Scale Language Models [65.46403462928319]
Some consider large-scale language models that can generate long and coherent pieces of text as dangerous, since they may be used in misinformation campaigns.
Here we formulate large-scale language model output detection as a hypothesis testing problem to classify text as genuine or generated.
arXiv Detail & Related papers (2020-02-09T19:53:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.