Related papers: Textual Unlearning Gives a False Sense of Unlearning

Textual Unlearning Gives a False Sense of Unlearning

URL: http://arxiv.org/abs/2406.13348v1
Date: Wed, 19 Jun 2024 08:51:54 GMT
Title: Textual Unlearning Gives a False Sense of Unlearning
Authors: Jiacheng Du, Zhibo Wang, Kui Ren,
Abstract summary: Language models (LMs) are susceptible to "memorizing" training data, including a large amount of private or copyright-protected content. We propose the Textual Unlearning Leakage Attack (TULA), where an adversary can infer information about unlearned data only by accessing the models before and after unlearning. Our work is the first to reveal that machine unlearning in LMs can inversely create greater knowledge risks and inspire the development of more secure unlearning mechanisms.
Score: 12.792770622915906
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Language models (LMs) are susceptible to "memorizing" training data, including a large amount of private or copyright-protected content. To safeguard the right to be forgotten (RTBF), machine unlearning has emerged as a promising method for LMs to efficiently "forget" sensitive training content and mitigate knowledge leakage risks. However, despite its good intentions, could the unlearning mechanism be counterproductive? In this paper, we propose the Textual Unlearning Leakage Attack (TULA), where an adversary can infer information about the unlearned data only by accessing the models before and after unlearning. Furthermore, we present variants of TULA in both black-box and white-box scenarios. Through various experimental results, we critically demonstrate that machine unlearning amplifies the risk of knowledge leakage from LMs. Specifically, TULA can increase an adversary's ability to infer membership information about the unlearned data by more than 20% in black-box scenario. Moreover, TULA can even reconstruct the unlearned data directly with more than 60% accuracy with white-box access. Our work is the first to reveal that machine unlearning in LMs can inversely create greater knowledge risks and inspire the development of more secure unlearning mechanisms.

Related papers

Soft Token Attacks Cannot Reliably Audit Unlearning in Large Language Models [5.807314706494602]
We show that soft token attacks (STAs) can successfully extract purportedly unlearned information from large language models (LLMs) Our work highlights the need for better evaluation baselines, and more appropriate auditing tools for assessing the effectiveness of unlearning.
arXiv Detail & Related papers (2025-02-20T13:22:33Z)
Game-Theoretic Machine Unlearning: Mitigating Extra Privacy Leakage [12.737028324709609]
Recent legislation obligates organizations to remove requested data and its influence from a trained model. We propose a game-theoretic machine unlearning algorithm that simulates the competitive relationship between unlearning performance and privacy protection.
arXiv Detail & Related papers (2024-11-06T13:47:04Z)
A Closer Look at Machine Unlearning for Large Language Models [46.245404272612795]
Large language models (LLMs) may memorize sensitive or copyrighted content, raising privacy and legal concerns. We discuss several issues in machine unlearning for LLMs and provide our insights on possible approaches.
arXiv Detail & Related papers (2024-10-10T16:56:05Z)
Verification of Machine Unlearning is Fragile [48.71651033308842]
We introduce two novel adversarial unlearning processes capable of circumventing both types of verification strategies. This study highlights the vulnerabilities and limitations in machine unlearning verification, paving the way for further research into the safety of machine unlearning.
arXiv Detail & Related papers (2024-08-01T21:37:10Z)
Learn while Unlearn: An Iterative Unlearning Framework for Generative Language Models [49.043599241803825]
Iterative Contrastive Unlearning (ICU) framework consists of three core components. A Knowledge Unlearning Induction module removes specific knowledge through an unlearning loss. A Contrastive Learning Enhancement module to preserve the model's expressive capabilities against the pure unlearning goal. And an Iterative Unlearning Refinement module that dynamically assess the unlearning extent on specific data pieces and make iterative update.
arXiv Detail & Related papers (2024-07-25T07:09:35Z)
UnUnlearning: Unlearning is not sufficient for content regulation in advanced generative AI [50.61495097098296]
We revisit the paradigm in which unlearning is used for Large Language Models (LLMs) We introduce a concept of ununlearning, where unlearned knowledge gets reintroduced in-context. We argue that content filtering for impermissible knowledge will be required and even exact unlearning schemes are not enough for effective content regulation.
arXiv Detail & Related papers (2024-06-27T10:24:35Z)
Jogging the Memory of Unlearned LLMs Through Targeted Relearning Attacks [37.061187080745654]
We show that existing approaches for unlearning in LLMs are surprisingly susceptible to a simple set of targeted relearning attacks. With access to only a small and potentially loosely related set of data, we find that we can "jog" the memory of unlearned models to reverse the effects of unlearning.
arXiv Detail & Related papers (2024-06-19T09:03:21Z)
Offset Unlearning for Large Language Models [49.851093293780615]
Unlearning has emerged as a potential remedy for Large Language Models affected by problematic training data. We propose $delta$-unlearning, an offset unlearning framework for black-box LLMs. Experiments demonstrate that $delta$-unlearning can effectively unlearn target data while maintaining similar or even stronger performance on general out-of-forget-scope tasks.
arXiv Detail & Related papers (2024-04-17T03:39:51Z)
Learn What You Want to Unlearn: Unlearning Inversion Attacks against Machine Unlearning [16.809644622465086]
We conduct the first investigation to understand the extent to which machine unlearning can leak the confidential content of unlearned data. Under the Machine Learning as a Service setting, we propose unlearning inversion attacks that can reveal the feature and label information of an unlearned sample. The experimental results indicate that the proposed attack can reveal the sensitive information of the unlearned data.
arXiv Detail & Related papers (2024-04-04T06:37:46Z)
The Frontier of Data Erasure: Machine Unlearning for Large Language Models [56.26002631481726]
Large Language Models (LLMs) are foundational to AI advancements. LLMs pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information. Machine unlearning emerges as a cutting-edge solution to mitigate these concerns.
arXiv Detail & Related papers (2024-03-23T09:26:15Z)
Rethinking Machine Unlearning for Large Language Models [85.92660644100582]
We explore machine unlearning in the domain of large language models (LLMs) This initiative aims to eliminate undesirable data influence (e.g., sensitive or illegal information) and the associated model capabilities.
arXiv Detail & Related papers (2024-02-13T20:51:58Z)
A Duty to Forget, a Right to be Assured? Exposing Vulnerabilities in Machine Unlearning Services [31.347825826778276]
We try to explore the potential threats posed by unlearning services in Machine Learning (ML) We propose two strategies that leverage over-unlearning to measure the impact on the trade-off balancing. Results indicate significant potential for both strategies to undermine model efficacy in unlearning scenarios.
arXiv Detail & Related papers (2023-09-15T08:00:45Z)
Harnessing the Power of Text-image Contrastive Models for Automatic Detection of Online Misinformation [50.46219766161111]
We develop a self-learning model to explore the constrastive learning in the domain of misinformation identification. Our model shows the superior performance of non-matched image-text pair detection when the training data is insufficient.
arXiv Detail & Related papers (2023-04-19T02:53:59Z)
Context-faithful Prompting for Large Language Models [51.194410884263135]
Large language models (LLMs) encode parametric knowledge about world facts. Their reliance on parametric knowledge may cause them to overlook contextual cues, leading to incorrect predictions in context-sensitive NLP tasks. We assess and enhance LLMs' contextual faithfulness in two aspects: knowledge conflict and prediction with abstention.
arXiv Detail & Related papers (2023-03-20T17:54:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.