Related papers: MemHunter: Automated and Verifiable Memorization Detection at Dataset-scale in LLMs

MemHunter: Automated and Verifiable Memorization Detection at Dataset-scale in LLMs

URL: http://arxiv.org/abs/2412.07261v2
Date: Sun, 16 Feb 2025 06:20:55 GMT
Title: MemHunter: Automated and Verifiable Memorization Detection at Dataset-scale in LLMs
Authors: Zhenpeng Wu, Jian Lou, Zibin Zheng, Chuan Chen,
Abstract summary: We introduce MemHunter, which trains a memory-inducing LLM and employs hypothesis testing to efficiently detect memorization at the dataset level.<n>MemHunter is the first method capable of dataset-level memorization detection, providing a critical tool for assessing privacy risks in large-scale datasets.
Score: 28.593941036010417
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) have been shown to memorize and reproduce content from their training data, raising significant privacy concerns, especially with web-scale datasets. Existing methods for detecting memorization are primarily sample-specific, relying on manually crafted or discretely optimized memory-inducing prompts generated on a per-sample basis, which become impractical for dataset-level detection due to the prohibitive computational cost of iterating through all samples. In real-world scenarios, data owners may need to verify whether a susceptible LLM has memorized their dataset, particularly if the LLM may have collected the data from the web without authorization. To address this, we introduce MemHunter, which trains a memory-inducing LLM and employs hypothesis testing to efficiently detect memorization at the dataset level, without requiring sample-specific memory inducing. Experiments on models like Pythia and Llama demonstrate that MemHunter can extract up to 40% more training data than existing methods under constrained time resources and reduce search time by up to 80% when integrated as a plug-in. Crucially, MemHunter is the first method capable of dataset-level memorization detection, providing a critical tool for assessing privacy risks in LLMs powered by large-scale datasets.

Related papers

Memorization or Interpolation ? Detecting LLM Memorization through Input Perturbation Analysis [8.725781605542675]
Large Language Models (LLMs) achieve remarkable performance through training on massive datasets.<n>LLMs can exhibit concerning behaviors such as verbatim reproduction of training data rather than true generalization.<n>This paper introduces PEARL, a novel approach for detecting memorization in LLMs.
arXiv Detail & Related papers (2025-05-05T20:42:34Z)
Information-Guided Identification of Training Data Imprint in (Proprietary) Large Language Models [52.439289085318634]
We show how to identify training data known to proprietary large language models (LLMs) by using information-guided probes. Our work builds on a key observation: text passages with high surprisal are good search material for memorization probes.
arXiv Detail & Related papers (2025-03-15T10:19:15Z)
Does Data Contamination Detection Work (Well) for LLMs? A Survey and Evaluation on Detection Assumptions [20.51842378080194]
Large language models (LLMs) have demonstrated great performance across various benchmarks, showing potential as general-purpose task solvers. As LLMs are typically trained on vast amounts of data, a significant concern in their evaluation is data contamination. We systematically review 50 papers on data contamination detection, categorize the underlying assumptions, and assess whether they have been rigorously validated.
arXiv Detail & Related papers (2024-10-24T17:58:22Z)
Training on the Benchmark Is Not All You Need [52.01920740114261]
We propose a simple and effective data leakage detection method based on the contents of multiple-choice options. Our method is able to work under gray-box conditions without access to model training data or weights. We evaluate the degree of data leakage of 35 mainstream open-source LLMs on four benchmark datasets.
arXiv Detail & Related papers (2024-09-03T11:09:44Z)
Evaluating Large Language Model based Personal Information Extraction and Countermeasures [63.91918057570824]
Large language model (LLM) can be misused by attackers to accurately extract various personal information from personal profiles. LLM outperforms conventional methods at such extraction. prompt injection can mitigate such risk to a large extent and outperforms conventional countermeasures.
arXiv Detail & Related papers (2024-08-14T04:49:30Z)
Anomaly Detection of Tabular Data Using LLMs [54.470648484612866]
We show that pre-trained large language models (LLMs) are zero-shot batch-level anomaly detectors. We propose an end-to-end fine-tuning strategy to bring out the potential of LLMs in detecting real anomalies.
arXiv Detail & Related papers (2024-06-24T04:17:03Z)
Large Language Models Memorize Sensor Datasets! Implications on Human Activity Recognition Research [0.23982628363233693]
We investigate whether Large Language Models (LLMs) have had access to standard Human Activity Recognition (HAR) datasets during training. Most contemporary LLMs are trained on virtually the entire (accessible) internet -- potentially including standard HAR datasets. For the Daphnet dataset in particular, GPT-4 is able to reproduce blocks of sensor readings.
arXiv Detail & Related papers (2024-06-09T19:38:27Z)
Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models [21.10890310571397]
Large Language Models (LLMs) can be applied to a diverse set of tasks, but the critical issues of data contamination and memorization are often glossed over.<n>This work introduces a variety of different techniques to assess whether a language model has seen a dataset during training.<n>We then compare the few-shot learning performance of LLMs on datasets that were seen during training to the performance on datasets released after training.
arXiv Detail & Related papers (2024-04-09T10:58:21Z)
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement [79.31084387589968]
Pretrained large language models (LLMs) are currently state-of-the-art for solving the vast majority of natural language processing tasks. We propose LLM2LLM, a data augmentation strategy that uses a teacher LLM to enhance a small seed dataset. We achieve improvements up to 24.2% on the GSM8K dataset, 32.6% on CaseHOLD, 32.0% on SNIPS, 52.6% on TREC and 39.8% on SST-2 over regular fine-tuning in the low-data regime.
arXiv Detail & Related papers (2024-03-22T08:57:07Z)
Elephants Never Forget: Testing Language Models for Memorization of Tabular Data [21.912611415307644]
Large Language Models (LLMs) can be applied to a diverse set of tasks, but the critical issues of data contamination and memorization are often glossed over. We introduce a variety of different techniques to assess the degrees of contamination, including statistical tests for conditional distribution modeling and four tests that identify memorization.
arXiv Detail & Related papers (2024-03-11T12:07:13Z)
Alpaca against Vicuna: Using LLMs to Uncover Memorization of LLMs [61.04246774006429]
We introduce a black-box prompt optimization method that uses an attacker LLM agent to uncover higher levels of memorization in a victim agent. We observe that our instruction-based prompts generate outputs with 23.7% higher overlap with training data compared to the baseline prefix-suffix measurements. Our findings show that instruction-tuned models can expose pre-training data as much as their base-models, if not more so, and using instructions proposed by other LLMs can open a new avenue of automated attacks.
arXiv Detail & Related papers (2024-03-05T19:32:01Z)
How to Train Data-Efficient LLMs [56.41105687693619]
We study data-efficient approaches for pre-training language models (LLMs) We find that Ask-LLM and Density sampling are the best methods in their respective categories. In our comparison of 19 samplers, involving hundreds of evaluation tasks and pre-training runs, we find that Ask-LLM and Density are the best methods in their respective categories.
arXiv Detail & Related papers (2024-02-15T02:27:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.