Related papers: Analyzing Leakage of Personally Identifiable Information in Language Models

Analyzing Leakage of Personally Identifiable Information in Language Models

URL: http://arxiv.org/abs/2302.00539v4
Date: Sun, 23 Apr 2023 22:20:47 GMT
Title: Analyzing Leakage of Personally Identifiable Information in Language Models
Authors: Nils Lukas, Ahmed Salem, Robert Sim, Shruti Tople, Lukas Wutschitz and Santiago Zanella-B\'eguelin
Abstract summary: Language Models (LMs) have been shown to leak information about training data through sentence-level membership inference and reconstruction attacks. Scrubbing techniques reduce but do not prevent the risk of PII leakage. It is unclear to which extent algorithmic defenses such as differential privacy, designed to guarantee user-level privacy, prevent PII disclosure.
Score: 13.467340359030855
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Language Models (LMs) have been shown to leak information about training data through sentence-level membership inference and reconstruction attacks. Understanding the risk of LMs leaking Personally Identifiable Information (PII) has received less attention, which can be attributed to the false assumption that dataset curation techniques such as scrubbing are sufficient to prevent PII leakage. Scrubbing techniques reduce but do not prevent the risk of PII leakage: in practice scrubbing is imperfect and must balance the trade-off between minimizing disclosure and preserving the utility of the dataset. On the other hand, it is unclear to which extent algorithmic defenses such as differential privacy, designed to guarantee sentence- or user-level privacy, prevent PII disclosure. In this work, we introduce rigorous game-based definitions for three types of PII leakage via black-box extraction, inference, and reconstruction attacks with only API access to an LM. We empirically evaluate the attacks against GPT-2 models fine-tuned with and without defenses in three domains: case law, health care, and e-mails. Our main contributions are (i) novel attacks that can extract up to 10$\times$ more PII sequences than existing attacks, (ii) showing that sentence-level differential privacy reduces the risk of PII disclosure but still leaks about 3% of PII sequences, and (iii) a subtle connection between record-level membership inference and PII reconstruction. Code to reproduce all experiments in the paper is available at https://github.com/microsoft/analysing_pii_leakage.

Related papers

A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage [77.83757117924995]
We propose a new framework that evaluates re-identification attacks to quantify individual privacy risks upon data release. Our approach shows that seemingly innocuous auxiliary information can be used to infer sensitive attributes like age or substance use history from sanitized data.
arXiv Detail & Related papers (2025-04-28T01:16:27Z)
PrivacyScalpel: Enhancing LLM Privacy via Interpretable Feature Intervention with Sparse Autoencoders [8.483679748399037]
Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language processing but pose privacy risks by memorizing and leaking Personally Identifiable Information (PII) Existing mitigation strategies, such as differential privacy and neuron-level interventions, often degrade model utility or fail to effectively prevent leakage. We introduce PrivacyScalpel, a novel privacy-preserving framework that leverages interpretability techniques to identify and mitigate PII leakage while maintaining performance.
arXiv Detail & Related papers (2025-03-14T09:31:01Z)
R.R.: Unveiling LLM Training Privacy through Recollection and Ranking [17.12953978321457]
Large Language Models (LLMs) pose significant privacy risks, potentially leaking training data due to implicit memorization. We propose R.R. (Recollect and Rank), a novel two-step privacy stealing attack that enables attackers to reconstruct PII entities from scrubbed training data. Experiments across three popular PII datasets demonstrate that the R.R. achieves better PII identical performance compared to baselines.
arXiv Detail & Related papers (2025-02-18T09:05:59Z)
Towards Copyright Protection for Knowledge Bases of Retrieval-augmented Language Models via Ownership Verification with Reasoning [58.57194301645823]
Large language models (LLMs) are increasingly integrated into real-world applications through retrieval-augmented generation (RAG) mechanisms. Existing methods that can be generalized as watermarking techniques to protect these knowledge bases typically involve poisoning attacks. We propose name for harmless' copyright protection of knowledge bases.
arXiv Detail & Related papers (2025-02-10T09:15:56Z)
How Private are Language Models in Abstractive Summarization? [36.801842863853715]
Language models (LMs) have shown outstanding performance in text summarization including sensitive domains such as medicine and law. However, to what extent LMs can provide privacy-preserving summaries given a non-private source document remains under-explored.
arXiv Detail & Related papers (2024-12-16T18:08:22Z)
Pseudo-Probability Unlearning: Towards Efficient and Privacy-Preserving Machine Unlearning [59.29849532966454]
We propose PseudoProbability Unlearning (PPU), a novel method that enables models to forget data to adhere to privacy-preserving manner. Our method achieves over 20% improvements in forgetting error compared to the state-of-the-art.
arXiv Detail & Related papers (2024-11-04T21:27:06Z)
Evaluating Large Language Model based Personal Information Extraction and Countermeasures [63.91918057570824]
Large language model (LLM) can be misused by attackers to accurately extract various personal information from personal profiles. LLM outperforms conventional methods at such extraction. prompt injection can mitigate such risk to a large extent and outperforms conventional countermeasures.
arXiv Detail & Related papers (2024-08-14T04:49:30Z)
Information Leakage from Embedding in Large Language Models [5.475800773759642]
This study aims to investigate the potential for privacy invasion through input reconstruction attacks. We first propose two base methods to reconstruct original texts from a model's hidden states. We then present Embed Parrot, a Transformer-based method, to reconstruct input from embeddings in deep layers.
arXiv Detail & Related papers (2024-05-20T09:52:31Z)
InferDPT: Privacy-Preserving Inference for Black-box Large Language Model [66.07752875835506]
InferDPT is the first practical framework for the privacy-preserving Inference of black-box LLMs. RANTEXT is a novel differential privacy mechanism integrated into the perturbation module of InferDPT.
arXiv Detail & Related papers (2023-10-18T18:00:11Z)
Is Vertical Logistic Regression Privacy-Preserving? A Comprehensive Privacy Analysis and Beyond [57.10914865054868]
We consider vertical logistic regression (VLR) trained with mini-batch descent gradient. We provide a comprehensive and rigorous privacy analysis of VLR in a class of open-source Federated Learning frameworks.
arXiv Detail & Related papers (2022-07-19T05:47:30Z)
Do Gradient Inversion Attacks Make Federated Learning Unsafe? [70.0231254112197]
Federated learning (FL) allows the collaborative training of AI models without needing to share raw data. Recent works on the inversion of deep neural networks from model gradients raised concerns about the security of FL in preventing the leakage of training data. In this work, we show that these attacks presented in the literature are impractical in real FL use-cases and provide a new baseline attack.
arXiv Detail & Related papers (2022-02-14T18:33:12Z)
BEAS: Blockchain Enabled Asynchronous & Secure Federated Machine Learning [0.0]
We present BEAS, the first blockchain-based framework for N-party Federated Learning. It provides strict privacy guarantees of training data using gradient pruning. Anomaly detection protocols are used to minimize the risk of data-poisoning attacks. We also define a novel protocol to prevent premature convergence in heterogeneous learning environments.
arXiv Detail & Related papers (2022-02-06T17:11:14Z)
Federated Deep Learning with Bayesian Privacy [28.99404058773532]
Federated learning (FL) aims to protect data privacy by cooperatively learning a model without sharing private data among users. Homomorphic encryption (HE) based methods provide secure privacy protections but suffer from extremely high computational and communication overheads. Deep learning with Differential Privacy (DP) was implemented as a practical learning algorithm at a manageable cost in complexity.
arXiv Detail & Related papers (2021-09-27T12:48:40Z)
Continuous Doubly Constrained Batch Reinforcement Learning [93.23842221189658]
We propose an algorithm for batch RL, where effective policies are learned using only a fixed offline dataset instead of online interactions with the environment. The limited data in batch RL produces inherent uncertainty in value estimates of states/actions that were insufficiently represented in the training data. We propose to mitigate this issue via two straightforward penalties: a policy-constraint to reduce this divergence and a value-constraint that discourages overly optimistic estimates.
arXiv Detail & Related papers (2021-02-18T08:54:14Z)
Rethinking Privacy Preserving Deep Learning: How to Evaluate and Thwart Privacy Attacks [31.34410250008759]
This paper measures the trade-off between model accuracy and privacy losses incurred by reconstruction, tracing and membership attacks. Experiments show that model accuracies are improved on average by 5-20% compared with baseline mechanisms.
arXiv Detail & Related papers (2020-06-20T15:48:57Z)
Stratified cross-validation for unbiased and privacy-preserving federated learning [0.0]
We focus on the recurrent problem of duplicated records that, if not handled properly, may cause over-optimistic estimations of a model's performances. We introduce and discuss stratified cross-validation, a validation methodology that leverages stratification techniques to prevent data leakage in federated learning settings.
arXiv Detail & Related papers (2020-01-22T15:49:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.