Proactive Privacy Amnesia for Large Language Models: Safeguarding PII with Negligible Impact on Model Utility
- URL: http://arxiv.org/abs/2502.17591v2
- Date: Tue, 11 Mar 2025 17:32:22 GMT
- Title: Proactive Privacy Amnesia for Large Language Models: Safeguarding PII with Negligible Impact on Model Utility
- Authors: Martin Kuo, Jingyang Zhang, Jianyi Zhang, Minxue Tang, Louis DiValentin, Aolin Ding, Jingwei Sun, William Chen, Amin Hass, Tianlong Chen, Yiran Chen, Hai Li,
- Abstract summary: We propose a novel approach, Proactive Privacy Amnesia, to safeguard PII in large language models (LLMs)<n>This mechanism works by actively identifying and forgetting key memories most closely associated with PII in sequences, followed by a memory implanting to maintain the LLM's functionality.<n>Results show that our PPA method completely eliminates the risk of phone number exposure by 100% and significantly reduces the risk of physical address exposure by 9.8% - 87.6%.
- Score: 39.51362903320998
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the rise of large language models (LLMs), increasing research has recognized their risk of leaking personally identifiable information (PII) under malicious attacks. Although efforts have been made to protect PII in LLMs, existing methods struggle to balance privacy protection with maintaining model utility. In this paper, inspired by studies of amnesia in cognitive science, we propose a novel approach, Proactive Privacy Amnesia (PPA), to safeguard PII in LLMs while preserving their utility. This mechanism works by actively identifying and forgetting key memories most closely associated with PII in sequences, followed by a memory implanting using suitable substitute memories to maintain the LLM's functionality. We conduct evaluations across multiple models to protect common PII, such as phone numbers and physical addresses, against prevalent PII-targeted attacks, demonstrating the superiority of our method compared with other existing defensive techniques. The results show that our PPA method completely eliminates the risk of phone number exposure by 100% and significantly reduces the risk of physical address exposure by 9.8% - 87.6%, all while maintaining comparable model utility performance.
Related papers
- Can Differentially Private Fine-tuning LLMs Protect Against Privacy Attacks? [8.189149471520542]
Fine-tuning large language models (LLMs) has become an essential strategy for adapting them to specialized tasks.
Although differential privacy (DP) offers strong theoretical guarantees against such leakage, its empirical privacy effectiveness on LLMs remains unclear.
This paper systematically investigates the impact of DP across fine-tuning methods and privacy budgets.
arXiv Detail & Related papers (2025-04-28T05:34:53Z) - PrivacyScalpel: Enhancing LLM Privacy via Interpretable Feature Intervention with Sparse Autoencoders [8.483679748399037]
Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language processing but pose privacy risks by memorizing and leaking Personally Identifiable Information (PII)
Existing mitigation strategies, such as differential privacy and neuron-level interventions, often degrade model utility or fail to effectively prevent leakage.
We introduce PrivacyScalpel, a novel privacy-preserving framework that leverages interpretability techniques to identify and mitigate PII leakage while maintaining performance.
arXiv Detail & Related papers (2025-03-14T09:31:01Z) - Pseudo-Probability Unlearning: Towards Efficient and Privacy-Preserving Machine Unlearning [59.29849532966454]
We propose PseudoProbability Unlearning (PPU), a novel method that enables models to forget data to adhere to privacy-preserving manner.
Our method achieves over 20% improvements in forgetting error compared to the state-of-the-art.
arXiv Detail & Related papers (2024-11-04T21:27:06Z) - Enhancing Data Privacy in Large Language Models through Private Association Editing [1.078439500019266]
Large language models (LLMs) require a significant redesign in solutions to preserve privacy in data-intensive applications.
This paper introduces Private Association Editing (PAE) as a novel defense approach for private data leakage.
arXiv Detail & Related papers (2024-06-26T10:08:47Z) - Protecting Privacy Through Approximating Optimal Parameters for Sequence Unlearning in Language Models [37.172662930947446]
Language models (LMs) are potentially vulnerable to extraction attacks, which represent a significant privacy risk.
We propose Privacy Protection via Optimal Parameters (POP), a novel unlearning method that effectively forgets the target token sequences from the pretrained LM.
POP exhibits remarkable retention performance post-unlearning across 9 classification and 4 dialogue benchmarks, outperforming the state-of-the-art by a large margin.
arXiv Detail & Related papers (2024-06-20T08:12:49Z) - The Pitfalls and Promise of Conformal Inference Under Adversarial Attacks [90.52808174102157]
In safety-critical applications such as medical imaging and autonomous driving, it is imperative to maintain both high adversarial robustness to protect against potential adversarial attacks.
A notable knowledge gap remains concerning the uncertainty inherent in adversarially trained models.
This study investigates the uncertainty of deep learning models by examining the performance of conformal prediction (CP) in the context of standard adversarial attacks.
arXiv Detail & Related papers (2024-05-14T18:05:19Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - Defending Pre-trained Language Models as Few-shot Learners against
Backdoor Attacks [72.03945355787776]
We advocate MDP, a lightweight, pluggable, and effective defense for PLMs as few-shot learners.
We show analytically that MDP creates an interesting dilemma for the attacker to choose between attack effectiveness and detection evasiveness.
arXiv Detail & Related papers (2023-09-23T04:41:55Z) - Avoid Adversarial Adaption in Federated Learning by Multi-Metric
Investigations [55.2480439325792]
Federated Learning (FL) facilitates decentralized machine learning model training, preserving data privacy, lowering communication costs, and boosting model performance through diversified data sources.
FL faces vulnerabilities such as poisoning attacks, undermining model integrity with both untargeted performance degradation and targeted backdoor attacks.
We define a new notion of strong adaptive adversaries, capable of adapting to multiple objectives simultaneously.
MESAS is the first defense robust against strong adaptive adversaries, effective in real-world data scenarios, with an average overhead of just 24.37 seconds.
arXiv Detail & Related papers (2023-06-06T11:44:42Z) - Discriminative Adversarial Privacy: Balancing Accuracy and Membership
Privacy in Neural Networks [7.0895962209555465]
Discriminative Adversarial Privacy (DAP) is a learning technique designed to achieve a balance between model performance, speed, and privacy.
DAP relies on adversarial training based on a novel loss function able to minimise the prediction error while maximising the MIA's error.
In addition, we introduce a novel metric named Accuracy Over Privacy (AOP) to capture the performance-privacy trade-off.
arXiv Detail & Related papers (2023-06-05T17:25:45Z) - Analyzing Leakage of Personally Identifiable Information in Language
Models [13.467340359030855]
Language Models (LMs) have been shown to leak information about training data through sentence-level membership inference and reconstruction attacks.
Scrubbing techniques reduce but do not prevent the risk of PII leakage.
It is unclear to which extent algorithmic defenses such as differential privacy, designed to guarantee user-level privacy, prevent PII disclosure.
arXiv Detail & Related papers (2023-02-01T16:04:48Z) - Rethinking Privacy Preserving Deep Learning: How to Evaluate and Thwart
Privacy Attacks [31.34410250008759]
This paper measures the trade-off between model accuracy and privacy losses incurred by reconstruction, tracing and membership attacks.
Experiments show that model accuracies are improved on average by 5-20% compared with baseline mechanisms.
arXiv Detail & Related papers (2020-06-20T15:48:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.