Related papers: Hey, That's My Data! Label-Only Dataset Inference in Large Language Models

Hey, That's My Data! Label-Only Dataset Inference in Large Language Models

URL: http://arxiv.org/abs/2506.06057v1
Date: Fri, 06 Jun 2025 13:02:59 GMT
Title: Hey, That's My Data! Label-Only Dataset Inference in Large Language Models
Authors: Chen Xiong, Zihao Wang, Rui Zhu, Tsung-Yi Ho, Pin-Yu Chen, Jingwei Xiong, Haixu Tang, Lucila Ohno-Machado,
Abstract summary: CatShift is a label-only dataset-inference framework.<n>It capitalizes on catastrophic forgetting: the tendency of an LLM to overwrite previously learned knowledge when exposed to new data.
Score: 63.35066172530291
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) have revolutionized Natural Language Processing by excelling at interpreting, reasoning about, and generating human language. However, their reliance on large-scale, often proprietary datasets poses a critical challenge: unauthorized usage of such data can lead to copyright infringement and significant financial harm. Existing dataset-inference methods typically depend on log probabilities to detect suspicious training material, yet many leading LLMs have begun withholding or obfuscating these signals. This reality underscores the pressing need for label-only approaches capable of identifying dataset membership without relying on internal model logits. We address this gap by introducing CatShift, a label-only dataset-inference framework that capitalizes on catastrophic forgetting: the tendency of an LLM to overwrite previously learned knowledge when exposed to new data. If a suspicious dataset was previously seen by the model, fine-tuning on a portion of it triggers a pronounced post-tuning shift in the model's outputs; conversely, truly novel data elicits more modest changes. By comparing the model's output shifts for a suspicious dataset against those for a known non-member validation set, we statistically determine whether the suspicious set is likely to have been part of the model's original training corpus. Extensive experiments on both open-source and API-based LLMs validate CatShift's effectiveness in logit-inaccessible settings, offering a robust and practical solution for safeguarding proprietary data.

Related papers

Unlocking Post-hoc Dataset Inference with Synthetic Data [11.886166976507711]
Training datasets are often scraped from the internet without respecting data owners' intellectual property rights.<n>Inference (DI) offers a potential remedy by identifying whether a suspect dataset was used in training.<n>Existing DI methods require a private set-known to be absent from training-that closely matches the compromised dataset's distribution.<n>In this work, we address this challenge by synthetically generating the required held-out set.
arXiv Detail & Related papers (2025-06-18T08:46:59Z)
Memorization or Interpolation ? Detecting LLM Memorization through Input Perturbation Analysis [8.725781605542675]
Large Language Models (LLMs) achieve remarkable performance through training on massive datasets.<n>LLMs can exhibit concerning behaviors such as verbatim reproduction of training data rather than true generalization.<n>This paper introduces PEARL, a novel approach for detecting memorization in LLMs.
arXiv Detail & Related papers (2025-05-05T20:42:34Z)
Self-Comparison for Dataset-Level Membership Inference in Large (Vision-)Language Models [73.94175015918059]
We propose a dataset-level membership inference method based on Self-Comparison. Our method does not require access to ground-truth member data or non-member data in identical distribution.
arXiv Detail & Related papers (2024-10-16T23:05:59Z)
Data Taggants: Dataset Ownership Verification via Harmless Targeted Data Poisoning [12.80649024603656]
This paper introduces data taggants, a novel non-backdoor dataset ownership verification technique. We validate our approach through comprehensive and realistic experiments on ImageNet1k using ViT and ResNet models with state-of-the-art training recipes.
arXiv Detail & Related papers (2024-10-09T12:49:23Z)
Formality is Favored: Unraveling the Learning Preferences of Large Language Models on Data with Conflicting Knowledge [55.65162959527848]
Large language models have shown excellent performance on many knowledge-intensive tasks. However, pretraining data tends to contain misleading and even conflicting information. This study systematically analyze LLMs' learning preferences for data with conflicting knowledge.
arXiv Detail & Related papers (2024-10-07T06:49:41Z)
Training on the Benchmark Is Not All You Need [52.01920740114261]
We propose a simple and effective data leakage detection method based on the contents of multiple-choice options.<n>Our method is able to work under gray-box conditions without access to model training data or weights.<n>We evaluate the degree of data leakage of 35 mainstream open-source LLMs on four benchmark datasets.
arXiv Detail & Related papers (2024-09-03T11:09:44Z)
Can LLMs Separate Instructions From Data? And What Do We Even Mean By That? [60.50127555651554]
Large Language Models (LLMs) show impressive results in numerous practical applications, but they lack essential safety features.<n>This makes them vulnerable to manipulations such as indirect prompt injections and generally unsuitable for safety-critical tasks.<n>We introduce a formal measure for instruction-data separation and an empirical variant that is calculable from a model's outputs.
arXiv Detail & Related papers (2024-03-11T15:48:56Z)
Recovering from Privacy-Preserving Masking with Large Language Models [14.828717714653779]
We use large language models (LLMs) to suggest substitutes of masked tokens. We show that models trained on the obfuscation corpora are able to achieve comparable performance with the ones trained on the original data.
arXiv Detail & Related papers (2023-09-12T16:39:41Z)
Bring Your Own Data! Self-Supervised Evaluation for Large Language Models [52.15056231665816]
We propose a framework for self-supervised evaluation of Large Language Models (LLMs) We demonstrate self-supervised evaluation strategies for measuring closed-book knowledge, toxicity, and long-range context dependence. We find strong correlations between self-supervised and human-supervised evaluations.
arXiv Detail & Related papers (2023-06-23T17:59:09Z)
NLI Data Sanity Check: Assessing the Effect of Data Corruption on Model Performance [3.7024660695776066]
We propose a new diagnostics test suite which allows to assess whether a dataset constitutes a good testbed for evaluating the models' meaning understanding capabilities. We specifically apply controlled corruption transformations to widely used benchmarks (MNLI and ANLI) A large decrease in model accuracy indicates that the original dataset provides a proper challenge to the models' reasoning capabilities.
arXiv Detail & Related papers (2021-04-10T12:28:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.