Related papers: Identifying Pre-training Data in LLMs: A Neuron Activation-Based Detection Framework

Identifying Pre-training Data in LLMs: A Neuron Activation-Based Detection Framework

URL: http://arxiv.org/abs/2507.16414v1
Date: Tue, 22 Jul 2025 10:05:30 GMT
Title: Identifying Pre-training Data in LLMs: A Neuron Activation-Based Detection Framework
Authors: Hongyi Tang, Zhihao Zhu, Yi Yang,
Abstract summary: Performance of large language models (LLMs) is closely tied to their training data, which can include copyrighted material or private information.<n>We introduce NA-PDD, a novel algorithm analyzing differential neuron activation patterns between training and non-training data in LLMs.<n>We also introduce CCNewsPDD, a temporally unbiased benchmark employing rigorous data transformations to ensure consistent time distributions between training and non-training data.
Score: 17.364424086991207
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The performance of large language models (LLMs) is closely tied to their training data, which can include copyrighted material or private information, raising legal and ethical concerns. Additionally, LLMs face criticism for dataset contamination and internalizing biases. To address these issues, the Pre-Training Data Detection (PDD) task was proposed to identify if specific data was included in an LLM's pre-training corpus. However, existing PDD methods often rely on superficial features like prediction confidence and loss, resulting in mediocre performance. To improve this, we introduce NA-PDD, a novel algorithm analyzing differential neuron activation patterns between training and non-training data in LLMs. This is based on the observation that these data types activate different neurons during LLM inference. We also introduce CCNewsPDD, a temporally unbiased benchmark employing rigorous data transformations to ensure consistent time distributions between training and non-training data. Our experiments demonstrate that NA-PDD significantly outperforms existing methods across three benchmarks and multiple LLMs.

Related papers

Capturing the Temporal Dependence of Training Data Influence [100.91355498124527]
We formalize the concept of trajectory-specific leave-one-out influence, which quantifies the impact of removing a data point during training.<n>We propose data value embedding, a novel technique enabling efficient approximation of trajectory-specific LOO.<n>As data value embedding captures training data ordering, it offers valuable insights into model training dynamics.
arXiv Detail & Related papers (2024-12-12T18:28:55Z)
Does Data Contamination Detection Work (Well) for LLMs? A Survey and Evaluation on Detection Assumptions [20.51842378080194]
Large language models (LLMs) have demonstrated great performance across various benchmarks, showing potential as general-purpose task solvers.<n>As LLMs are typically trained on vast amounts of data, a significant concern in their evaluation is data contamination.<n>We systematically review 50 papers on data contamination detection, categorize the underlying assumptions, and assess whether they have been rigorously validated.
arXiv Detail & Related papers (2024-10-24T17:58:22Z)
Detecting Training Data of Large Language Models via Expectation Maximization [62.28028046993391]
We introduce EM-MIA, a novel membership inference method that iteratively refines membership scores and prefix scores via an expectation-maximization algorithm.<n> EM-MIA achieves state-of-the-art results on WikiMIA.
arXiv Detail & Related papers (2024-10-10T03:31:16Z)
Fine-tuning can Help Detect Pretraining Data from Large Language Models [7.7209640786782385]
Current methods differentiate members and non-members by designing scoring functions, like Perplexity and Min-k%.<n>We introduce a novel and effective method termed Fine-tuned Score Deviation(FSD), which improves the performance of current scoring functions for pretraining data detection.<n>In particular, we propose to measure the deviation distance of current scores after fine-tuning on a small amount of unseen data within the same domain.
arXiv Detail & Related papers (2024-10-09T15:36:42Z)
Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method [108.56493934296687]
We introduce a divergence-based calibration method, inspired by the divergence-from-randomness concept, to calibrate token probabilities for pretraining data detection.<n>We have developed a Chinese-language benchmark, PatentMIA, to assess the performance of detection approaches for LLMs on Chinese text.
arXiv Detail & Related papers (2024-09-23T07:55:35Z)
Entropy Law: The Story Behind Data Compression and LLM Performance [115.70395740286422]
We find that model performance is negatively correlated to the compression ratio of training data, which usually yields a lower training loss. Based on the findings of the entropy law, we propose a quite efficient and universal data selection method. We also present an interesting application of entropy law that can detect potential performance risks at the beginning of model training.
arXiv Detail & Related papers (2024-07-09T08:14:29Z)
Probing Language Models for Pre-training Data Detection [11.37731401086372]
We propose to utilize the probing technique for pre-training data detection by examining the model's internal activations. Our method is simple and effective and leads to more trustworthy pre-training data detection.
arXiv Detail & Related papers (2024-06-03T13:58:04Z)
On Inter-dataset Code Duplication and Data Leakage in Large Language Models [4.148857672591562]
This paper explores the phenomenon of inter-dataset code duplication and its impact on evaluating large language models (LLMs) Our findings reveal a potential threat to the evaluation of LLMs across multiple SE tasks, stemming from the inter-dataset code duplication phenomenon. We provide evidence that open-source models could be affected by inter-dataset duplication.
arXiv Detail & Related papers (2024-01-15T19:46:40Z)
Data Contamination Through the Lens of Time [21.933771085956426]
Large language models (LLMs) are often supported by evaluating publicly available benchmarks. This practice raises concerns of data contamination, i.e., evaluating on examples that are explicitly or implicitly included in the training data. We conduct the first thorough longitudinal analysis of data contamination in LLMs by using the natural experiment of training cutoffs in GPT models.
arXiv Detail & Related papers (2023-10-16T17:51:29Z)
CAFA: Class-Aware Feature Alignment for Test-Time Adaptation [50.26963784271912]
Test-time adaptation (TTA) aims to address this challenge by adapting a model to unlabeled data at test time. We propose a simple yet effective feature alignment loss, termed as Class-Aware Feature Alignment (CAFA), which simultaneously encourages a model to learn target representations in a class-discriminative manner.
arXiv Detail & Related papers (2022-06-01T03:02:07Z)
On the Transferability of Pre-trained Language Models: A Study from Artificial Datasets [74.11825654535895]
Pre-training language models (LMs) on large-scale unlabeled text data makes the model much easier to achieve exceptional downstream performance. We study what specific traits in the pre-training data, other than the semantics, make a pre-trained LM superior to their counterparts trained from scratch on downstream tasks.
arXiv Detail & Related papers (2021-09-08T10:39:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.