Related papers: EL-MIA: Quantifying Membership Inference Risks of Sensitive Entities in LLMs

EL-MIA: Quantifying Membership Inference Risks of Sensitive Entities in LLMs

URL: http://arxiv.org/abs/2511.00192v1
Date: Fri, 31 Oct 2025 18:50:47 GMT
Title: EL-MIA: Quantifying Membership Inference Risks of Sensitive Entities in LLMs
Authors: Ali Satvaty, Suzan Verberne, Fatih Turkmen,
Abstract summary: We propose a new task in the context of LLM privacy: entity-level discovery of membership risk focused on sensitive information.<n>Existing methods for MIA can detect the presence of entire prompts or documents in the LLM training data, but they fail to capture risks at a finer granularity.<n>We construct a benchmark dataset for the evaluation of MIA methods on this task.
Score: 10.566053894405902
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Membership inference attacks (MIA) aim to infer whether a particular data point is part of the training dataset of a model. In this paper, we propose a new task in the context of LLM privacy: entity-level discovery of membership risk focused on sensitive information (PII, credit card numbers, etc). Existing methods for MIA can detect the presence of entire prompts or documents in the LLM training data, but they fail to capture risks at a finer granularity. We propose the ``EL-MIA'' framework for auditing entity-level membership risks in LLMs. We construct a benchmark dataset for the evaluation of MIA methods on this task. Using this benchmark, we conduct a systematic comparison of existing MIA techniques as well as two newly proposed methods. We provide a comprehensive analysis of the results, trying to explain the relation of the entity level MIA susceptability with the model scale, training epochs, and other surface level factors. Our findings reveal that existing MIA methods are limited when it comes to entity-level membership inference of the sensitive attributes, while this susceptibility can be outlined with relatively straightforward methods, highlighting the need for stronger adversaries to stress test the provided threat model.

Related papers

What Hard Tokens Reveal: Exploiting Low-confidence Tokens for Membership Inference Attacks against Large Language Models [2.621142288968429]
Membership Inference Attacks (MIAs) attempt to determine whether a specific data sample was included in a model training/fine-tuning dataset.<n>We propose a novel membership inference approach that captures the token-level probabilities for low-confidence (hard) tokens.<n>Experiments on both domain-specific medical datasets and general-purpose benchmarks demonstrate that HT-MIA consistently outperforms seven state-of-the-art MIA baselines.
arXiv Detail & Related papers (2026-01-27T22:31:10Z)
PerProb: Indirectly Evaluating Memorization in Large Language Models [13.905375956316632]
We propose PerProb, a label-free framework for indirectly assessing LLM vulnerabilities.<n>PerProb evaluates changes in perplexity and average log probability between data generated by victim and adversary models.<n>We evaluate PerProb's effectiveness across five datasets, revealing varying memory behaviors and privacy risks.
arXiv Detail & Related papers (2025-12-16T17:10:01Z)
OpenLVLM-MIA: A Controlled Benchmark Revealing the Limits of Membership Inference Attacks on Large Vision-Language Models [8.88331104584743]
OpenLVLM-MIA is a new benchmark that highlights fundamental challenges in evaluating membership inference attacks (MIA) against large vision-language models (LVLMs)<n>We introduce a controlled benchmark of 6,000 images where the distributions of member and non-member samples are carefully balanced, and ground-truth membership labels are provided across three distinct training stages.<n> Experiments using OpenLVLM-MIA demonstrated that the performance of state-of-the-art MIA methods converged to random chance under unbiased conditions.
arXiv Detail & Related papers (2025-10-18T01:39:28Z)
How to Choose a Threshold for an Evaluation Metric for Large Language Models [0.9423257767158634]
We propose a step-by-step recipe for picking a threshold for a given large language models (LLMs) evaluation metric.<n>We then propose concrete and statistically rigorous procedures to determine a threshold for the given LLM evaluation metric using available ground-truth data.
arXiv Detail & Related papers (2024-12-10T21:57:25Z)
Beyond Binary: Towards Fine-Grained LLM-Generated Text Detection via Role Recognition and Involvement Measurement [51.601916604301685]
Large language models (LLMs) generate content that can undermine trust in online discourse.<n>Current methods often focus on binary classification, failing to address the complexities of real-world scenarios like human-LLM collaboration.<n>To move beyond binary classification and address these challenges, we propose a new paradigm for detecting LLM-generated content.
arXiv Detail & Related papers (2024-10-18T08:14:10Z)
Detecting Training Data of Large Language Models via Expectation Maximization [62.28028046993391]
We introduce EM-MIA, a novel membership inference method that iteratively refines membership scores and prefix scores via an expectation-maximization algorithm.<n> EM-MIA achieves state-of-the-art results on WikiMIA.
arXiv Detail & Related papers (2024-10-10T03:31:16Z)
ReCaLL: Membership Inference via Relative Conditional Log-Likelihoods [56.073335779595475]
We propose ReCaLL (Relative Conditional Log-Likelihood) to detect pretraining data by leveraging conditional language modeling capabilities.<n>Our empirical findings show that conditioning member data on non-member prefixes induces a larger decrease in log-likelihood compared to non-member data.<n>We conduct comprehensive experiments and show that ReCaLL achieves state-of-the-art performance on the WikiMIA dataset.
arXiv Detail & Related papers (2024-06-23T00:23:13Z)
Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning [61.2224355547598]
Open-sourcing of large language models (LLMs) accelerates application development, innovation, and scientific progress. Our investigation exposes a critical oversight in this belief. By deploying carefully designed demonstrations, our research demonstrates that base LLMs could effectively interpret and execute malicious instructions.
arXiv Detail & Related papers (2024-04-16T13:22:54Z)
Do Membership Inference Attacks Work on Large Language Models? [141.2019867466968]
Membership inference attacks (MIAs) attempt to predict whether a particular datapoint is a member of a target model's training data. We perform a large-scale evaluation of MIAs over a suite of language models trained on the Pile, ranging from 160M to 12B parameters. We find that MIAs barely outperform random guessing for most settings across varying LLM sizes and domains.
arXiv Detail & Related papers (2024-02-12T17:52:05Z)
Practical Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration [32.15773300068426]
Membership Inference Attacks aim to infer whether a target data record has been utilized for model training. We propose a Membership Inference Attack based on Self-calibrated Probabilistic Variation (SPV-MIA)
arXiv Detail & Related papers (2023-11-10T13:55:05Z)
Assessing Privacy Risks in Language Models: A Case Study on Summarization Tasks [65.21536453075275]
We focus on the summarization task and investigate the membership inference (MI) attack. We exploit text similarity and the model's resistance to document modifications as potential MI signals. We discuss several safeguards for training summarization models to protect against MI attacks and discuss the inherent trade-off between privacy and utility.
arXiv Detail & Related papers (2023-10-20T05:44:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.