Related papers: Learning to Refuse: Towards Mitigating Privacy Risks in LLMs

Learning to Refuse: Towards Mitigating Privacy Risks in LLMs

URL: http://arxiv.org/abs/2407.10058v2
Date: Mon, 16 Sep 2024 07:20:13 GMT
Title: Learning to Refuse: Towards Mitigating Privacy Risks in LLMs
Authors: Zhenhua Liu, Tong Zhu, Chuanyuan Tan, Wenliang Chen,
Abstract summary: Large language models (LLMs) exhibit remarkable capabilities in understanding and generating natural language. This study addresses the challenge of enabling LLMs to protect specific individuals' private data without the need for complete retraining. We introduce the Name-Aware Unlearning Framework (NAUF) for Privacy Protection, which enables the model to learn which individuals' information should be protected.
Score: 6.685921135304385
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) exhibit remarkable capabilities in understanding and generating natural language. However, these models can inadvertently memorize private information, posing significant privacy risks. This study addresses the challenge of enabling LLMs to protect specific individuals' private data without the need for complete retraining. We propose \return, a Real-world pErsonal daTa UnleaRNing dataset, comprising 2,492 individuals from Wikipedia with associated QA pairs, to evaluate machine unlearning (MU) methods for protecting personal data in a realistic scenario. Additionally, we introduce the Name-Aware Unlearning Framework (NAUF) for Privacy Protection, which enables the model to learn which individuals' information should be protected without affecting its ability to answer questions related to other unrelated individuals. Our extensive experiments demonstrate that NAUF achieves a state-of-the-art average unlearning score, surpassing the best baseline method by 5.65 points, effectively protecting target individuals' personal data while maintaining the model's general capabilities.

Related papers

Data-Free Privacy-Preserving for LLMs via Model Inversion and Selective Unlearning [27.452191507918148]
Large language models (LLMs) exhibit powerful capabilities but risk memorizing sensitive personally identifiable information (PII) from their training data.<n>We propose Data-Free Selective Unlearning (DFSU), a novel privacy-preserving framework that removes sensitive PII from an LLM without requiring its training data.<n>Our approach first synthesizes pseudo-PII through language model inversion, then constructs token-level privacy masks for these synthetic samples, and finally performs token-level selective unlearning.
arXiv Detail & Related papers (2026-01-22T02:43:12Z)
Differential Privacy in Machine Learning: From Symbolic AI to LLMs [49.1574468325115]
Differential privacy provides a formal framework to mitigate privacy risks.<n>It ensures that the inclusion or exclusion of any single data point does not significantly alter the output of an algorithm.
arXiv Detail & Related papers (2025-06-13T11:30:35Z)
Self-Refining Language Model Anonymizers via Adversarial Distillation [49.17383264812234]
Large language models (LLMs) are increasingly used in sensitive domains, where their ability to infer personal data poses emerging privacy risks.<n>We introduce SElf-refining Anonymization with Language model (SEAL), a novel distillation framework for training small language models (SLMs) to perform effective anonymization.
arXiv Detail & Related papers (2025-06-02T08:21:27Z)
Membership Inference Attacks fueled by Few-Short Learning to detect privacy leakage tackling data integrity [7.8973037023478785]
Deep learning models memorize parts of their training data, creating a privacy leakage. We propose a Few-Shot learning based MIA, coined as the FeS-MIA model, which eases the evaluation of the privacy breach of a deep learning model. We also propose an interpretable quantitative and qualitative measure of privacy, referred to as Log-MIA measure.
arXiv Detail & Related papers (2025-03-12T13:09:43Z)
Evaluating LLM-based Personal Information Extraction and Countermeasures [63.91918057570824]
Large language model (LLM) based personal information extraction can be benchmarked. LLM can be misused by attackers to accurately extract various personal information from personal profiles. prompt injection can defend against strong LLM-based attacks, reducing the attack to less effective traditional ones.
arXiv Detail & Related papers (2024-08-14T04:49:30Z)
Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions [12.451936012379319]
Large Language Models (LLMs) represent a significant advancement in artificial intelligence, finding applications across various domains. Their reliance on massive internet-sourced datasets for training brings notable privacy issues. Certain application-specific scenarios may require fine-tuning these models on private data.
arXiv Detail & Related papers (2024-08-10T05:41:19Z)
Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning [62.224804688233]
differential privacy (DP) offers a promising solution by ensuring models are 'almost indistinguishable' with or without any particular privacy unit. We study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users.
arXiv Detail & Related papers (2024-06-20T13:54:32Z)
Ungeneralizable Examples [70.76487163068109]
Current approaches to creating unlearnable data involve incorporating small, specially designed noises. We extend the concept of unlearnable data to conditional data learnability and introduce textbfUntextbfGeneralizable textbfExamples (UGEs) UGEs exhibit learnability for authorized users while maintaining unlearnability for potential hackers.
arXiv Detail & Related papers (2024-04-22T09:29:14Z)
Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory [82.7042006247124]
We show that even the most capable AI models reveal private information in contexts that humans would not, 39% and 57% of the time, respectively. Our work underscores the immediate need to explore novel inference-time privacy-preserving approaches, based on reasoning and theory of mind.
arXiv Detail & Related papers (2023-10-27T04:15:30Z)
Privacy Preserving Large Language Models: ChatGPT Case Study Based Vision and Framework [6.828884629694705]
This article proposes the conceptual model called PrivChatGPT, a privacy-generative model for LLMs. PrivChatGPT consists of two main components i.e., preserving user privacy during the data curation/pre-processing together with preserving private context and the private training process for large-scale data.
arXiv Detail & Related papers (2023-10-19T06:55:13Z)
Privacy in Large Language Models: Attacks, Defenses and Future Directions [84.73301039987128]
We analyze the current privacy attacks targeting large language models (LLMs) and categorize them according to the adversary's assumed capabilities. We present a detailed overview of prominent defense strategies that have been developed to counter these privacy attacks.
arXiv Detail & Related papers (2023-10-16T13:23:54Z)
PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind) Our work offers a theoretical analysis for model design and benchmarks various techniques. In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z)
Can Language Models be Instructed to Protect Personal Information? [30.187731765653428]
We introduce PrivQA -- a benchmark to assess the privacy/utility trade-off when a model is instructed to protect specific categories of personal information in a simulated scenario. We find that adversaries can easily circumvent these protections with simple jailbreaking methods through textual and/or image inputs. We believe PrivQA has the potential to support the development of new models with improved privacy protections, as well as the adversarial robustness of these protections.
arXiv Detail & Related papers (2023-10-03T17:30:33Z)
Knowledge Unlearning for Mitigating Privacy Risks in Language Models [31.322818016245087]
We propose knowledge unlearning as an alternative method to reduce privacy risks for language models. We show that simply applying the unlikelihood training objective to target token sequences is effective at forgetting them. We show that unlearning can give a stronger empirical privacy guarantee in scenarios where the data vulnerable to extraction attacks are known a priori.
arXiv Detail & Related papers (2022-10-04T10:18:11Z)
PCAL: A Privacy-preserving Intelligent Credit Risk Modeling Framework Based on Adversarial Learning [111.19576084222345]
This paper proposes a framework of Privacy-preserving Credit risk modeling based on Adversarial Learning (PCAL) PCAL aims to mask the private information inside the original dataset, while maintaining the important utility information for the target prediction task performance. Results indicate that PCAL can learn an effective, privacy-free representation from user data, providing a solid foundation towards privacy-preserving machine learning for credit risk analysis.
arXiv Detail & Related papers (2020-10-06T07:04:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.