Related papers: Blackbox Dataset Inference for LLM

Blackbox Dataset Inference for LLM

URL: http://arxiv.org/abs/2507.03619v2
Date: Fri, 18 Jul 2025 19:19:10 GMT
Title: Blackbox Dataset Inference for LLM
Authors: Ruikai Zhou, Kang Yang, Xun Chen, Wendy Hui Wang, Guanhong Tao, Jun Xu,
Abstract summary: Training large language models can involve personally identifiable information and copyrighted material.<n>This paper explores textitdataset inference, which aims to detect if a suspect model used a victim dataset $mathcalD$ in training.
Score: 27.02176845242058
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Today, the training of large language models (LLMs) can involve personally identifiable information and copyrighted material, incurring dataset misuse. To mitigate the problem of dataset misuse, this paper explores \textit{dataset inference}, which aims to detect if a suspect model $\mathcal{M}$ used a victim dataset $\mathcal{D}$ in training. Previous research tackles dataset inference by aggregating results of membership inference attacks (MIAs) -- methods to determine whether individual samples are a part of the training dataset. However, restricted by the low accuracy of MIAs, previous research mandates grey-box access to $\mathcal{M}$ to get intermediate outputs (probabilities, loss, perplexity, etc.) for obtaining satisfactory results. This leads to reduced practicality, as LLMs, especially those deployed for profits, have limited incentives to return the intermediate outputs. In this paper, we propose a new method of dataset inference with only black-box access to the target model (i.e., assuming only the text-based responses of the target model are available). Our method is enabled by two sets of locally built reference models, one set involving $\mathcal{D}$ in training and the other not. By measuring which set of reference model $\mathcal{M}$ is closer to, we determine if $\mathcal{M}$ used $\mathcal{D}$ for training. Evaluations of real-world LLMs in the wild show that our method offers high accuracy in all settings and presents robustness against bypassing attempts.

Related papers

Zero-Shot Attribution for Large Language Models: A Distribution Testing Approach [19.455425068600665]
We investigate the problem of attributing code generated by language models using hypothesis testing to leverage established techniques and guarantees.<n>We introduce $mathsfAnubis$, a zero-shot attribution tool that frames attribution as a distribution testing problem.
arXiv Detail & Related papers (2025-06-25T07:37:16Z)
Inverse Entropic Optimal Transport Solves Semi-supervised Learning via Data Likelihood Maximization [65.8915778873691]
conditional distributions is a central problem in machine learning.<n>We propose a new paradigm that integrates both paired and unpaired data.<n>We show that our approach can theoretically recover true conditional distributions with arbitrarily small error.
arXiv Detail & Related papers (2024-10-03T16:12:59Z)
Training on the Benchmark Is Not All You Need [52.01920740114261]
We propose a simple and effective data leakage detection method based on the contents of multiple-choice options.<n>Our method is able to work under gray-box conditions without access to model training data or weights.<n>We evaluate the degree of data leakage of 35 mainstream open-source LLMs on four benchmark datasets.
arXiv Detail & Related papers (2024-09-03T11:09:44Z)
LLM Dataset Inference: Did you train on my dataset? [42.97830562143777]
We propose a new dataset inference method to accurately identify the datasets used to train large language models. Our approach successfully distinguishes the train and test sets of different subsets of the Pile with statistically significant p-values 0.1, without any false positives.
arXiv Detail & Related papers (2024-06-10T16:34:43Z)
Alpaca against Vicuna: Using LLMs to Uncover Memorization of LLMs [61.04246774006429]
We introduce a black-box prompt optimization method that uses an attacker LLM agent to uncover higher levels of memorization in a victim agent.<n>We observe that our instruction-based prompts generate outputs with 23.7% higher overlap with training data compared to the baseline prefix-suffix measurements.<n>Our findings show that instruction-tuned models can expose pre-training data as much as their base-models, if not more so, and using instructions proposed by other LLMs can open a new avenue of automated attacks.
arXiv Detail & Related papers (2024-03-05T19:32:01Z)
Data-Efficient Learning via Clustering-Based Sensitivity Sampling: Foundation Models and Beyond [28.651041302245538]
We present a new data selection approach based on $k$-means clustering and sampling sensitivity. We show how it can be applied on linear regression, leading to a new sampling strategy that surprisingly matches the performances of leverage score sampling.
arXiv Detail & Related papers (2024-02-27T09:03:43Z)
Detecting Pretraining Data from Large Language Models [90.12037980837738]
We study the pretraining data detection problem. Given a piece of text and black-box access to an LLM without knowing the pretraining data, can we determine if the model was trained on the provided text? We introduce a new detection method Min-K% Prob based on a simple hypothesis.
arXiv Detail & Related papers (2023-10-25T17:21:23Z)
Towards a methodology for addressing missingness in datasets, with an application to demographic health datasets [0.0]
We present a methodology for tackling missing data problems using a combination of synthetic dataset generation, missing data imputation and deep learning methods. Our results show that models trained on synthetic and imputed datasets could make predictions with an accuracy of $83 %$ and $80 %$ on $a) $ an unseen real dataset and $b)$ an unseen reserved synthetic test dataset.
arXiv Detail & Related papers (2022-11-05T09:02:30Z)
Bias Mimicking: A Simple Sampling Approach for Bias Mitigation [57.17709477668213]
We introduce a new class-conditioned sampling method: Bias Mimicking. Bias Mimicking improves underrepresented groups' accuracy of sampling methods by 3% over four benchmarks.
arXiv Detail & Related papers (2022-09-30T17:33:00Z)
On the Generalization for Transfer Learning: An Information-Theoretic Analysis [8.102199960821165]
We give an information-theoretic analysis of the generalization error and excess risk of transfer learning algorithms. Our results suggest, perhaps as expected, that the Kullback-Leibler divergenceD(mu|mu')$ plays an important role in the characterizations. We then generalize the mutual information bound with other divergences such as $phi$-divergence and Wasserstein distance.
arXiv Detail & Related papers (2022-07-12T08:20:41Z)
Datamodels: Predicting Predictions from Training Data [86.66720175866415]
We present a conceptual framework, datamodeling, for analyzing the behavior of a model class in terms of the training data. We show that even simple linear datamodels can successfully predict model outputs.
arXiv Detail & Related papers (2022-02-01T18:15:24Z)
Learning to extrapolate using continued fractions: Predicting the critical temperature of superconductor materials [5.905364646955811]
In the field of Artificial Intelligence (AI) and Machine Learning (ML), the approximation of unknown target functions $y=f(mathbfx)$ is a common objective. We refer to $S$ as the training set and aim to identify a low-complexity mathematical model that can effectively approximate this target function for new instances $mathbfx$.
arXiv Detail & Related papers (2020-11-27T04:57:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.