R.R.: Unveiling LLM Training Privacy through Recollection and Ranking
- URL: http://arxiv.org/abs/2502.12658v1
- Date: Tue, 18 Feb 2025 09:05:59 GMT
- Title: R.R.: Unveiling LLM Training Privacy through Recollection and Ranking
- Authors: Wenlong Meng, Zhenyuan Guo, Lenan Wu, Chen Gong, Wenyan Liu, Weixian Li, Chengkun Wei, Wenzhi Chen,
- Abstract summary: Large Language Models (LLMs) pose significant privacy risks, potentially leaking training data due to implicit memorization.<n>We propose R.R. (Recollect and Rank), a novel two-step privacy stealing attack that enables attackers to reconstruct PII entities from scrubbed training data.<n> Experiments across three popular PII datasets demonstrate that the R.R. achieves better PII identical performance compared to baselines.
- Score: 17.12953978321457
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) pose significant privacy risks, potentially leaking training data due to implicit memorization. Existing privacy attacks primarily focus on membership inference attacks (MIAs) or data extraction attacks, but reconstructing specific personally identifiable information (PII) in LLM's training data remains challenging. In this paper, we propose R.R. (Recollect and Rank), a novel two-step privacy stealing attack that enables attackers to reconstruct PII entities from scrubbed training data where the PII entities have been masked. In the first stage, we introduce a prompt paradigm named recollection, which instructs the LLM to repeat a masked text but fill in masks. Then we can use PII identifiers to extract recollected PII candidates. In the second stage, we design a new criterion to score each PII candidate and rank them. Motivated by membership inference, we leverage the reference model as a calibration to our criterion. Experiments across three popular PII datasets demonstrate that the R.R. achieves better PII identical performance compared to baselines. These results highlight the vulnerability of LLMs to PII leakage even when training data has been scrubbed. We release the replicate package of R.R. at a link.
Related papers
- Information-Guided Identification of Training Data Imprint in (Proprietary) Large Language Models [52.439289085318634]
We show how to identify training data known to proprietary large language models (LLMs) by using information-guided probes.
Our work builds on a key observation: text passages with high surprisal are good search material for memorization probes.
arXiv Detail & Related papers (2025-03-15T10:19:15Z) - Rank-R1: Enhancing Reasoning in LLM-based Document Rerankers via Reinforcement Learning [76.50690734636477]
We introduce Rank-R1, a novel LLM-based reranker that performs reasoning over both the user query and candidate documents before performing the ranking task.
Our experiments on the TREC DL and BRIGHT datasets show that Rank-R1 is highly effective, especially for complex queries.
arXiv Detail & Related papers (2025-03-08T03:14:26Z) - Privacy Ripple Effects from Adding or Removing Personal Information in Language Model Training [19.119349775283556]
We find that the amount and ease of PII is a dynamic property of a model that evolves throughout training pipelines.
We characterize three such novel phenomena: (1) similar-appearing PII seen later in training can elicit memorization of earlier-seen sequences in what we call assisted memorization.
arXiv Detail & Related papers (2025-02-21T18:59:14Z) - Evaluating LLM-based Personal Information Extraction and Countermeasures [63.91918057570824]
Large language model (LLM) based personal information extraction can be benchmarked.<n>LLM can be misused by attackers to accurately extract various personal information from personal profiles.<n> prompt injection can defend against strong LLM-based attacks, reducing the attack to less effective traditional ones.
arXiv Detail & Related papers (2024-08-14T04:49:30Z) - FIRST: Faster Improved Listwise Reranking with Single Token Decoding [56.727761901751194]
First, we introduce FIRST, a novel listwise LLM reranking approach leveraging the output logits of the first generated identifier to directly obtain a ranked ordering of the candidates.
Empirical results demonstrate that FIRST accelerates inference by 50% while maintaining a robust ranking performance with gains across the BEIR benchmark.
Our results show that LLM rerankers can provide a stronger distillation signal compared to cross-encoders, yielding substantial improvements in retriever recall after relevance feedback.
arXiv Detail & Related papers (2024-06-21T21:27:50Z) - Alpaca against Vicuna: Using LLMs to Uncover Memorization of LLMs [61.04246774006429]
We introduce a black-box prompt optimization method that uses an attacker LLM agent to uncover higher levels of memorization in a victim agent.<n>We observe that our instruction-based prompts generate outputs with 23.7% higher overlap with training data compared to the baseline prefix-suffix measurements.<n>Our findings show that instruction-tuned models can expose pre-training data as much as their base-models, if not more so, and using instructions proposed by other LLMs can open a new avenue of automated attacks.
arXiv Detail & Related papers (2024-03-05T19:32:01Z) - The Good and The Bad: Exploring Privacy Issues in Retrieval-Augmented
Generation (RAG) [56.67603627046346]
Retrieval-augmented generation (RAG) is a powerful technique to facilitate language model with proprietary and private data.
In this work, we conduct empirical studies with novel attack methods, which demonstrate the vulnerability of RAG systems on leaking the private retrieval database.
arXiv Detail & Related papers (2024-02-23T18:35:15Z) - Practical Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration [32.15773300068426]
Membership Inference Attacks aim to infer whether a target data record has been utilized for model training.
We propose a Membership Inference Attack based on Self-calibrated Probabilistic Variation (SPV-MIA)
arXiv Detail & Related papers (2023-11-10T13:55:05Z) - The Janus Interface: How Fine-Tuning in Large Language Models Amplifies the Privacy Risks [19.364127374679253]
We propose a novel attack, Janus, which exploits the fine-tuning interface to recover forgotten PIIs from the pre-training data in language models.
Our experiment results show that Janus amplifies the privacy risks by over 10 times in comparison with the baseline.
Our analysis validates that existing fine-tuning APIs provided by OpenAI and Azure AI Studio are susceptible to our Janus attack.
arXiv Detail & Related papers (2023-10-24T02:48:19Z) - Analyzing Leakage of Personally Identifiable Information in Language
Models [13.467340359030855]
Language Models (LMs) have been shown to leak information about training data through sentence-level membership inference and reconstruction attacks.
Scrubbing techniques reduce but do not prevent the risk of PII leakage.
It is unclear to which extent algorithmic defenses such as differential privacy, designed to guarantee user-level privacy, prevent PII disclosure.
arXiv Detail & Related papers (2023-02-01T16:04:48Z) - FIRE: A Failure-Adaptive Reinforcement Learning Framework for Edge Computing Migrations [52.85536740465277]
FIRE is a framework that adapts to rare events by training a RL policy in an edge computing digital twin environment.
We propose ImRE, an importance sampling-based Q-learning algorithm, which samples rare events proportionally to their impact on the value function.
We show that FIRE reduces costs compared to vanilla RL and the greedy baseline in the event of failures.
arXiv Detail & Related papers (2022-09-28T19:49:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.