STAMP Your Content: Proving Dataset Membership via Watermarked Rephrasings
- URL: http://arxiv.org/abs/2504.13416v1
- Date: Fri, 18 Apr 2025 02:25:08 GMT
- Title: STAMP Your Content: Proving Dataset Membership via Watermarked Rephrasings
- Authors: Saksham Rastogi, Pratyush Maini, Danish Pruthi,
- Abstract summary: STAMP is a framework for detecting dataset membership.<n>One version is to be released publicly, while others are to be kept private.<n>We show that our framework can successfully detect contamination across four benchmarks which appear only once in the training data.
- Score: 17.175065729425825
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Given how large parts of publicly available text are crawled to pretrain large language models (LLMs), data creators increasingly worry about the inclusion of their proprietary data for model training without attribution or licensing. Their concerns are also shared by benchmark curators whose test-sets might be compromised. In this paper, we present STAMP, a framework for detecting dataset membership-i.e., determining the inclusion of a dataset in the pretraining corpora of LLMs. Given an original piece of content, our proposal involves first generating multiple rephrases, each embedding a watermark with a unique secret key. One version is to be released publicly, while others are to be kept private. Subsequently, creators can compare model likelihoods between public and private versions using paired statistical tests to prove membership. We show that our framework can successfully detect contamination across four benchmarks which appear only once in the training data and constitute less than 0.001% of the total tokens, outperforming several contamination detection and dataset inference baselines. We verify that STAMP preserves both the semantic meaning and the utility of the original data in comparing different models. We apply STAMP to two real-world scenarios to confirm the inclusion of paper abstracts and blog articles in the pretraining corpora.
Related papers
- CBW: Towards Dataset Ownership Verification for Speaker Verification via Clustering-based Backdoor Watermarking [85.68235482145091]
Large-scale speech datasets have become valuable intellectual property.<n>We propose a novel dataset ownership verification method.<n>Our approach introduces a clustering-based backdoor watermark (CBW)<n>We conduct extensive experiments on benchmark datasets, verifying the effectiveness and robustness of our method against potential adaptive attacks.
arXiv Detail & Related papers (2025-03-02T02:02:57Z) - Measuring Bias of Web-filtered Text Datasets and Bias Propagation Through Training [22.53813258871828]
We investigate biases in pretraining datasets for large language models (LLMs) through dataset classification experiments.
We find that neural networks can classify surprisingly well which dataset a single text sequence belongs to, significantly better than a human can.
arXiv Detail & Related papers (2024-12-03T21:43:58Z) - Self-Comparison for Dataset-Level Membership Inference in Large (Vision-)Language Models [73.94175015918059]
We propose a dataset-level membership inference method based on Self-Comparison.
Our method does not require access to ground-truth member data or non-member data in identical distribution.
arXiv Detail & Related papers (2024-10-16T23:05:59Z) - Training on the Benchmark Is Not All You Need [52.01920740114261]
We propose a simple and effective data leakage detection method based on the contents of multiple-choice options.<n>Our method is able to work under gray-box conditions without access to model training data or weights.<n>We evaluate the degree of data leakage of 35 mainstream open-source LLMs on four benchmark datasets.
arXiv Detail & Related papers (2024-09-03T11:09:44Z) - Range Membership Inference Attacks [17.28638946021444]
We introduce the class of range membership inference attacks (RaMIAs), testing if the model was trained on any data in a specified range.
We show that RaMIAs can capture privacy loss more accurately and comprehensively than MIAs on various types of data.
arXiv Detail & Related papers (2024-08-09T15:39:06Z) - Fact Checking Beyond Training Set [64.88575826304024]
We show that the retriever-reader suffers from performance deterioration when it is trained on labeled data from one domain and used in another domain.
We propose an adversarial algorithm to make the retriever component robust against distribution shift.
We then construct eight fact checking scenarios from these datasets, and compare our model to a set of strong baseline models.
arXiv Detail & Related papers (2024-03-27T15:15:14Z) - Detecting Pretraining Data from Large Language Models [90.12037980837738]
We study the pretraining data detection problem.
Given a piece of text and black-box access to an LLM without knowing the pretraining data, can we determine if the model was trained on the provided text?
We introduce a new detection method Min-K% Prob based on a simple hypothesis.
arXiv Detail & Related papers (2023-10-25T17:21:23Z) - Client-specific Property Inference against Secure Aggregation in
Federated Learning [52.8564467292226]
Federated learning has become a widely used paradigm for collaboratively training a common model among different participants.
Many attacks have shown that it is still possible to infer sensitive information such as membership, property, or outright reconstruction of participant data.
We show that simple linear models can effectively capture client-specific properties only from the aggregated model updates.
arXiv Detail & Related papers (2023-03-07T14:11:01Z) - Differentially Private Label Protection in Split Learning [20.691549091238965]
Split learning is a distributed training framework that allows multiple parties to jointly train a machine learning model over partitioned data.
Recent works showed that the implementation of split learning suffers from severe privacy risks that a semi-honest adversary can easily reconstruct labels.
We propose textsfTPSL (Transcript Private Split Learning), a generic gradient based split learning framework that provides provable differential privacy guarantee.
arXiv Detail & Related papers (2022-03-04T00:35:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.