Related papers: R.R.: Unveiling LLM Training Privacy through Recollection and Ranking

R.R.: Unveiling LLM Training Privacy through Recollection and Ranking

URL: http://arxiv.org/abs/2502.12658v1
Date: Tue, 18 Feb 2025 09:05:59 GMT
Title: R.R.: Unveiling LLM Training Privacy through Recollection and Ranking
Authors: Wenlong Meng, Zhenyuan Guo, Lenan Wu, Chen Gong, Wenyan Liu, Weixian Li, Chengkun Wei, Wenzhi Chen,
Abstract summary: Large Language Models (LLMs) pose significant privacy risks, potentially leaking training data due to implicit memorization.<n>We propose R.R. (Recollect and Rank), a novel two-step privacy stealing attack that enables attackers to reconstruct PII entities from scrubbed training data.<n> Experiments across three popular PII datasets demonstrate that the R.R. achieves better PII identical performance compared to baselines.
Score: 17.12953978321457
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) pose significant privacy risks, potentially leaking training data due to implicit memorization. Existing privacy attacks primarily focus on membership inference attacks (MIAs) or data extraction attacks, but reconstructing specific personally identifiable information (PII) in LLM's training data remains challenging. In this paper, we propose R.R. (Recollect and Rank), a novel two-step privacy stealing attack that enables attackers to reconstruct PII entities from scrubbed training data where the PII entities have been masked. In the first stage, we introduce a prompt paradigm named recollection, which instructs the LLM to repeat a masked text but fill in masks. Then we can use PII identifiers to extract recollected PII candidates. In the second stage, we design a new criterion to score each PII candidate and rank them. Motivated by membership inference, we leverage the reference model as a calibration to our criterion. Experiments across three popular PII datasets demonstrate that the R.R. achieves better PII identical performance compared to baselines. These results highlight the vulnerability of LLMs to PII leakage even when training data has been scrubbed. We release the replicate package of R.R. at a link.

Related papers

Learning to Detect Language Model Training Data via Active Reconstruction [65.4791582049743]
We introduce textbfActive Data Reconstruction Attack (ADRA)<n>ADRA induces a model to reconstruct a given text through training.<n>Our algorithms consistently outperform existing MIAs in detecting pre-training, post-training, and distillation data.
arXiv Detail & Related papers (2026-02-22T03:20:06Z)
Data-Free Privacy-Preserving for LLMs via Model Inversion and Selective Unlearning [27.452191507918148]
Large language models (LLMs) exhibit powerful capabilities but risk memorizing sensitive personally identifiable information (PII) from their training data.<n>We propose Data-Free Selective Unlearning (DFSU), a novel privacy-preserving framework that removes sensitive PII from an LLM without requiring its training data.<n>Our approach first synthesizes pseudo-PII through language model inversion, then constructs token-level privacy masks for these synthetic samples, and finally performs token-level selective unlearning.
arXiv Detail & Related papers (2026-01-22T02:43:12Z)
(Token-Level) InfoRMIA: Stronger Membership Inference and Memorization Assessment for LLMs [13.601386341584545]
Large language models (LLMs) are now trained on nearly all available data, which amplifies the magnitude of information leakage.<n>Standard method to quantify privacy is via membership inference attacks.<n>We present InfoRMIA, a principled information-theoretic formulation of membership inference.
arXiv Detail & Related papers (2025-10-07T04:59:49Z)
Private Memorization Editing: Turning Memorization into a Defense to Strengthen Data Privacy in Large Language Models [1.2874523233023452]
We introduce Private Memorization Editing (PME), an approach for preventing private data leakage.<n>We detect a memorized PII and then mitigate the memorization of PII by editing a model knowledge of its training data.<n>PME can effectively reduce the number of leaked PII in a number of configurations, in some cases even reducing the accuracy of the privacy attacks to zero.
arXiv Detail & Related papers (2025-06-09T17:57:43Z)
PIG: Privacy Jailbreak Attack on LLMs via Gradient-based Iterative In-Context Optimization [22.20191563383239]
We propose PIG, a novel framework targeting Personally Identifiable Information (PII)<n> PIG identifies PII entities and their types in privacy queries, uses in-context learning to build a privacy context, and iteratively updates it with three gradient-based strategies to elicit target PII.<n>We evaluate PIG and existing jailbreak methods using two privacy-related datasets.
arXiv Detail & Related papers (2025-05-15T03:11:57Z)
Information-Guided Identification of Training Data Imprint in (Proprietary) Large Language Models [52.439289085318634]
We show how to identify training data known to proprietary large language models (LLMs) by using information-guided probes. Our work builds on a key observation: text passages with high surprisal are good search material for memorization probes.
arXiv Detail & Related papers (2025-03-15T10:19:15Z)
Rank-R1: Enhancing Reasoning in LLM-based Document Rerankers via Reinforcement Learning [76.50690734636477]
We introduce Rank-R1, a novel LLM-based reranker that performs reasoning over both the user query and candidate documents before performing the ranking task. Our experiments on the TREC DL and BRIGHT datasets show that Rank-R1 is highly effective, especially for complex queries.
arXiv Detail & Related papers (2025-03-08T03:14:26Z)
Privacy Ripple Effects from Adding or Removing Personal Information in Language Model Training [19.119349775283556]
We find that the amount and ease of PII is a dynamic property of a model that evolves throughout training pipelines. We characterize three such novel phenomena: (1) similar-appearing PII seen later in training can elicit memorization of earlier-seen sequences in what we call assisted memorization.
arXiv Detail & Related papers (2025-02-21T18:59:14Z)
PrivAgent: Agentic-based Red-teaming for LLM Privacy Leakage [78.33839735526769]
LLMs may be fooled into outputting private information under carefully crafted adversarial prompts.<n>PrivAgent is a novel black-box red-teaming framework for privacy leakage.
arXiv Detail & Related papers (2024-12-07T20:09:01Z)
Evaluating LLM-based Personal Information Extraction and Countermeasures [63.91918057570824]
Large language model (LLM) based personal information extraction can be benchmarked.<n>LLM can be misused by attackers to accurately extract various personal information from personal profiles.<n> prompt injection can defend against strong LLM-based attacks, reducing the attack to less effective traditional ones.
arXiv Detail & Related papers (2024-08-14T04:49:30Z)
FIRST: Faster Improved Listwise Reranking with Single Token Decoding [56.727761901751194]
First, we introduce FIRST, a novel listwise LLM reranking approach leveraging the output logits of the first generated identifier to directly obtain a ranked ordering of the candidates. Empirical results demonstrate that FIRST accelerates inference by 50% while maintaining a robust ranking performance with gains across the BEIR benchmark. Our results show that LLM rerankers can provide a stronger distillation signal compared to cross-encoders, yielding substantial improvements in retriever recall after relevance feedback.
arXiv Detail & Related papers (2024-06-21T21:27:50Z)
Alpaca against Vicuna: Using LLMs to Uncover Memorization of LLMs [61.04246774006429]
We introduce a black-box prompt optimization method that uses an attacker LLM agent to uncover higher levels of memorization in a victim agent.<n>We observe that our instruction-based prompts generate outputs with 23.7% higher overlap with training data compared to the baseline prefix-suffix measurements.<n>Our findings show that instruction-tuned models can expose pre-training data as much as their base-models, if not more so, and using instructions proposed by other LLMs can open a new avenue of automated attacks.
arXiv Detail & Related papers (2024-03-05T19:32:01Z)
The Good and The Bad: Exploring Privacy Issues in Retrieval-Augmented Generation (RAG) [56.67603627046346]
Retrieval-augmented generation (RAG) is a powerful technique to facilitate language model with proprietary and private data. In this work, we conduct empirical studies with novel attack methods, which demonstrate the vulnerability of RAG systems on leaking the private retrieval database.
arXiv Detail & Related papers (2024-02-23T18:35:15Z)
Practical Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration [32.15773300068426]
Membership Inference Attacks aim to infer whether a target data record has been utilized for model training. We propose a Membership Inference Attack based on Self-calibrated Probabilistic Variation (SPV-MIA)
arXiv Detail & Related papers (2023-11-10T13:55:05Z)
The Janus Interface: How Fine-Tuning in Large Language Models Amplifies the Privacy Risks [19.364127374679253]
We propose a novel attack, Janus, which exploits the fine-tuning interface to recover forgotten PIIs from the pre-training data in language models. Our experiment results show that Janus amplifies the privacy risks by over 10 times in comparison with the baseline. Our analysis validates that existing fine-tuning APIs provided by OpenAI and Azure AI Studio are susceptible to our Janus attack.
arXiv Detail & Related papers (2023-10-24T02:48:19Z)
ProPILE: Probing Privacy Leakage in Large Language Models [38.92840523665835]
Large language models (LLMs) are often trained on vast quantities of web-collected data, which may inadvertently include sensitive personal data. This paper presents ProPILE, a novel probing tool designed to empower data subjects, or the owners of the PII, with awareness of potential PII leakage.
arXiv Detail & Related papers (2023-07-04T18:53:47Z)
Analyzing Leakage of Personally Identifiable Information in Language Models [13.467340359030855]
Language Models (LMs) have been shown to leak information about training data through sentence-level membership inference and reconstruction attacks. Scrubbing techniques reduce but do not prevent the risk of PII leakage. It is unclear to which extent algorithmic defenses such as differential privacy, designed to guarantee user-level privacy, prevent PII disclosure.
arXiv Detail & Related papers (2023-02-01T16:04:48Z)
FIRE: A Failure-Adaptive Reinforcement Learning Framework for Edge Computing Migrations [52.85536740465277]
FIRE is a framework that adapts to rare events by training a RL policy in an edge computing digital twin environment. We propose ImRE, an importance sampling-based Q-learning algorithm, which samples rare events proportionally to their impact on the value function. We show that FIRE reduces costs compared to vanilla RL and the greedy baseline in the event of failures.
arXiv Detail & Related papers (2022-09-28T19:49:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.