Related papers: Detecting Training Data of Large Language Models via Expectation Maximization

Detecting Training Data of Large Language Models via Expectation Maximization

URL: http://arxiv.org/abs/2410.07582v1
Date: Thu, 10 Oct 2024 03:31:16 GMT
Title: Detecting Training Data of Large Language Models via Expectation Maximization
Authors: Gyuwan Kim, Yang Li, Evangelia Spiliopoulou, Jie Ma, Miguel Ballesteros, William Yang Wang,
Abstract summary: Membership inference attacks (MIAs) aim to determine whether a specific instance was part of a target model's training data. Applying MIAs to large language models (LLMs) presents unique challenges due to the massive scale of pre-training data and the ambiguous nature of membership. We introduce EM-MIA, a novel MIA method for LLMs that iteratively refines membership scores and prefix scores via an expectation-maximization algorithm.
Score: 62.28028046993391
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The widespread deployment of large language models (LLMs) has led to impressive advancements, yet information about their training data, a critical factor in their performance, remains undisclosed. Membership inference attacks (MIAs) aim to determine whether a specific instance was part of a target model's training data. MIAs can offer insights into LLM outputs and help detect and address concerns such as data contamination and compliance with privacy and copyright standards. However, applying MIAs to LLMs presents unique challenges due to the massive scale of pre-training data and the ambiguous nature of membership. Additionally, creating appropriate benchmarks to evaluate MIA methods is not straightforward, as training and test data distributions are often unknown. In this paper, we introduce EM-MIA, a novel MIA method for LLMs that iteratively refines membership scores and prefix scores via an expectation-maximization algorithm, leveraging the duality that the estimates of these scores can be improved by each other. Membership scores and prefix scores assess how each instance is likely to be a member and discriminative as a prefix, respectively. Our method achieves state-of-the-art results on the WikiMIA dataset. To further evaluate EM-MIA, we present OLMoMIA, a benchmark built from OLMo resources, which allows us to control the difficulty of MIA tasks with varying degrees of overlap between training and test data distributions. We believe that EM-MIA serves as a robust MIA method for LLMs and that OLMoMIA provides a valuable resource for comprehensively evaluating MIA approaches, thereby driving future research in this critical area.

Related papers

Membership Inference Attacks against Large Vision-Language Models [40.996912464828696]
Large vision-language models (VLLMs) exhibit promising capabilities for processing multi-modal tasks across various application scenarios. Their emergence also raises significant data security concerns, given the potential inclusion of sensitive information, such as private photos and medical records. Detecting inappropriately used data in VLLMs remains a critical and unresolved issue.
arXiv Detail & Related papers (2024-11-05T08:35:08Z)
Beyond Binary: Towards Fine-Grained LLM-Generated Text Detection via Role Recognition and Involvement Measurement [51.601916604301685]
Large language models (LLMs) generate content that can undermine trust in online discourse. Current methods often focus on binary classification, failing to address the complexities of real-world scenarios like human-AI collaboration. To move beyond binary classification and address these challenges, we propose a new paradigm for detecting LLM-generated content.
arXiv Detail & Related papers (2024-10-18T08:14:10Z)
Self-Comparison for Dataset-Level Membership Inference in Large (Vision-)Language Models [73.94175015918059]
We propose a dataset-level membership inference method based on Self-Comparison. Our method does not require access to ground-truth member data or non-member data in identical distribution.
arXiv Detail & Related papers (2024-10-16T23:05:59Z)
Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate [118.37653302885607]
We present the Modality Integration Rate (MIR), an effective, robust, and generalized metric to indicate the multi-modal pre-training quality of Large Vision Language Models (LVLMs) MIR is indicative about training data selection, training strategy schedule, and model architecture design to get better pre-training results.
arXiv Detail & Related papers (2024-10-09T17:59:04Z)
Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method [108.56493934296687]
We introduce a divergence-based calibration method, inspired by the divergence-from-randomness concept, to calibrate token probabilities for pretraining data detection. We have developed a Chinese-language benchmark, PatentMIA, to assess the performance of detection approaches for LLMs on Chinese text.
arXiv Detail & Related papers (2024-09-23T07:55:35Z)
MIA-Tuner: Adapting Large Language Models as Pre-training Text Detector [32.15773300068426]
Existing methods have designed various sophisticated MIA score functions to achieve considerable detection performance. We propose MIA-Tuner, a novel instruction-based MIA method, which instructs LLMs themselves to serve as a more precise pre-training data detector. We design two instruction-based safeguards to respectively mitigate the privacy risks brought by the existing methods and MIA-Tuner.
arXiv Detail & Related papers (2024-08-16T11:09:56Z)
Adaptive Pre-training Data Detection for Large Language Models via Surprising Tokens [1.2549198550400134]
Large language models (LLMs) are extensively used, but there are concerns regarding privacy, security, and copyright due to their opaque training data. Current solutions to this problem leverage techniques explored in machine learning privacy such as Membership Inference Attacks (MIAs) We propose an adaptive pre-training data detection method which alleviates this reliance and effectively amplify the identification.
arXiv Detail & Related papers (2024-07-30T23:43:59Z)
SoK: Membership Inference Attacks on LLMs are Rushing Nowhere (and How to Fix It) [16.673210422615348]
More than 10 new methods have been proposed to perform Membership Inference Attacks (MIAs) against LLMs. Contrary to traditional MIAs which rely on fixed -- but randomized -- records or models, these methods are mostly evaluated on datasets collected post-hoc. This lack of randomization raises concerns of a distribution shift between members and non-members.
arXiv Detail & Related papers (2024-06-25T23:12:07Z)
ReCaLL: Membership Inference via Relative Conditional Log-Likelihoods [56.073335779595475]
We propose ReCaLL (Relative Conditional Log-Likelihood), a novel membership inference attack (MIA) ReCaLL examines the relative change in conditional log-likelihoods when prefixing target data points with non-member context. We conduct comprehensive experiments and show that ReCaLL achieves state-of-the-art performance on the WikiMIA dataset.
arXiv Detail & Related papers (2024-06-23T00:23:13Z)
How to Train Data-Efficient LLMs [56.41105687693619]
We study data-efficient approaches for pre-training language models (LLMs) We find that Ask-LLM and Density sampling are the best methods in their respective categories. In our comparison of 19 samplers, involving hundreds of evaluation tasks and pre-training runs, we find that Ask-LLM and Density are the best methods in their respective categories.
arXiv Detail & Related papers (2024-02-15T02:27:57Z)
Do Membership Inference Attacks Work on Large Language Models? [141.2019867466968]
Membership inference attacks (MIAs) attempt to predict whether a particular datapoint is a member of a target model's training data. We perform a large-scale evaluation of MIAs over a suite of language models trained on the Pile, ranging from 160M to 12B parameters. We find that MIAs barely outperform random guessing for most settings across varying LLM sizes and domains.
arXiv Detail & Related papers (2024-02-12T17:52:05Z)
Learning-Based Difficulty Calibration for Enhanced Membership Inference Attacks [3.470379197911889]
Membership Inference Attacks (MIA) allows adversaries to determine whether a specific data point was part of a model's training dataset. We present a novel approach to MIA that is aimed at significantly improving TPR at low False Positive Rate (FPR) Experiment results show that LDC-MIA can improve TPR at low FPR by up to 4x compared to the other difficulty calibration based MIAs.
arXiv Detail & Related papers (2024-01-10T04:58:17Z)
MIA-BAD: An Approach for Enhancing Membership Inference Attack and its Mitigation with Federated Learning [6.510488168434277]
The membership inference attack (MIA) is a popular paradigm for compromising the privacy of a machine learning (ML) model. We propose an enhanced Membership Inference Attack with the Batch-wise generated Attack dataset (MIA-BAD) We show how training an ML model through FL, has some distinct advantages and investigate how the threat introduced with the proposed MIA-BAD approach can be mitigated with FL approaches.
arXiv Detail & Related papers (2023-11-28T06:51:26Z)
Assessing Privacy Risks in Language Models: A Case Study on Summarization Tasks [65.21536453075275]
We focus on the summarization task and investigate the membership inference (MI) attack. We exploit text similarity and the model's resistance to document modifications as potential MI signals. We discuss several safeguards for training summarization models to protect against MI attacks and discuss the inherent trade-off between privacy and utility.
arXiv Detail & Related papers (2023-10-20T05:44:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.