Related papers: Recalling The Forgotten Class Memberships: Unlearned Models Can Be Noisy Labelers to Leak Privacy

Recalling The Forgotten Class Memberships: Unlearned Models Can Be Noisy Labelers to Leak Privacy

URL: http://arxiv.org/abs/2506.19486v1
Date: Tue, 24 Jun 2025 10:21:10 GMT
Title: Recalling The Forgotten Class Memberships: Unlearned Models Can Be Noisy Labelers to Leak Privacy
Authors: Zhihao Sui, Liang Hu, Jian Cao, Dora D. Liu, Usman Naseem, Zhongyuan Lai, Qi Zhang,
Abstract summary: Current limited research on Machine Unlearning (MU) attacks requires access to original models containing privacy data.<n>We propose an innovative study on recalling the forgotten class memberships from unlearned models without requiring access to the original one.<n>Our study and evaluation have established a benchmark for future research on MU vulnerabilities.
Score: 13.702759117522447
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Machine Unlearning (MU) technology facilitates the removal of the influence of specific data instances from trained models on request. Despite rapid advancements in MU technology, its vulnerabilities are still underexplored, posing potential risks of privacy breaches through leaks of ostensibly unlearned information. Current limited research on MU attacks requires access to original models containing privacy data, which violates the critical privacy-preserving objective of MU. To address this gap, we initiate an innovative study on recalling the forgotten class memberships from unlearned models (ULMs) without requiring access to the original one. Specifically, we implement a Membership Recall Attack (MRA) framework with a teacher-student knowledge distillation architecture, where ULMs serve as noisy labelers to transfer knowledge to student models. Then, it is translated into a Learning with Noisy Labels (LNL) problem for inferring the correct labels of the forgetting instances. Extensive experiments on state-of-the-art MU methods with multiple real datasets demonstrate that the proposed MRA strategy exhibits high efficacy in recovering class memberships of unlearned instances. As a result, our study and evaluation have established a benchmark for future research on MU vulnerabilities.

Related papers

Does Multimodal Large Language Model Truly Unlearn? Stealthy MLLM Unlearning Attack [39.31635005360959]
Multimodal Large Language Models (MLLMs) trained on massive data may memorize sensitive personal information and photos, posing serious privacy risks.<n> MLLM unlearning methods are proposed, which fine-tune MLLMs to reduce the forget'' sensitive information.<n>We study a novel problem of LLM unlearning attack, which aims to recover the unlearned knowledge of an unlearned LLM.
arXiv Detail & Related papers (2025-06-10T04:52:03Z)
Verifying Machine Unlearning with Explainable AI [46.7583989202789]
We investigate the effectiveness of Explainable AI (XAI) in verifying Machine Unlearning (MU) within context of harbor front monitoring. Our proof-of-concept introduces attribution feature as an innovative verification step for MU, expanding beyond traditional metrics. We propose two novel XAI-based metrics, Heatmap Coverage (HC) and Attention Shift (AS) to evaluate the effectiveness of these methods.
arXiv Detail & Related papers (2024-11-20T13:57:32Z)
Game-Theoretic Machine Unlearning: Mitigating Extra Privacy Leakage [12.737028324709609]
Recent legislation obligates organizations to remove requested data and its influence from a trained model. We propose a game-theoretic machine unlearning algorithm that simulates the competitive relationship between unlearning performance and privacy protection.
arXiv Detail & Related papers (2024-11-06T13:47:04Z)
Learn while Unlearn: An Iterative Unlearning Framework for Generative Language Models [52.03511469562013]
We introduce the Iterative Contrastive Unlearning (ICU) framework, which consists of three core components.<n>A Knowledge Unlearning Induction module targets specific knowledge for removal using an unlearning loss.<n>A Contrastive Learning Enhancement module preserves the model's expressive capabilities against the pure unlearning goal.<n>An Iterative Unlearning Refinement module dynamically adjusts the unlearning process through ongoing evaluation and updates.
arXiv Detail & Related papers (2024-07-25T07:09:35Z)
A Method to Facilitate Membership Inference Attacks in Deep Learning Models [5.724311218570013]
We demonstrate a new form of membership inference attack that is strictly more powerful than prior art. Our attack empowers the adversary to reliably de-identify all the training samples. We show that the models can effectively disguise the amplified membership leakage under common membership privacy auditing.
arXiv Detail & Related papers (2024-07-02T03:33:42Z)
The Frontier of Data Erasure: Machine Unlearning for Large Language Models [56.26002631481726]
Large Language Models (LLMs) are foundational to AI advancements. LLMs pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information. Machine unlearning emerges as a cutting-edge solution to mitigate these concerns.
arXiv Detail & Related papers (2024-03-23T09:26:15Z)
Threats, Attacks, and Defenses in Machine Unlearning: A Survey [14.03428437751312]
Machine Unlearning (MU) has recently gained considerable attention due to its potential to achieve Safe AI.<n>This survey aims to fill the gap between the extensive number of studies on threats, attacks, and defenses in machine unlearning.
arXiv Detail & Related papers (2024-03-20T15:40:18Z)
Rethinking Machine Unlearning for Large Language Models [85.92660644100582]
We explore machine unlearning in the domain of large language models (LLMs)<n>This initiative aims to eliminate undesirable data influence (e.g., sensitive or illegal information) and the associated model capabilities.
arXiv Detail & Related papers (2024-02-13T20:51:58Z)
Learn to Unlearn: A Survey on Machine Unlearning [29.077334665555316]
This article presents a review of recent machine unlearning techniques, verification mechanisms, and potential attacks. We highlight emerging challenges and prospective research directions. We aim for this paper to provide valuable resources for integrating privacy, equity, andresilience into ML systems.
arXiv Detail & Related papers (2023-05-12T14:28:02Z)
Unlearnable Clusters: Towards Label-agnostic Unlearnable Examples [128.25509832644025]
There is a growing interest in developing unlearnable examples (UEs) against visual privacy leaks on the Internet. UEs are training samples added with invisible but unlearnable noise, which have been found can prevent unauthorized training of machine learning models. We present a novel technique called Unlearnable Clusters (UCs) to generate label-agnostic unlearnable examples with cluster-wise perturbations.
arXiv Detail & Related papers (2022-12-31T04:26:25Z)
A Survey of Machine Unlearning [56.017968863854186]
Recent regulations now require that, on request, private information about a user must be removed from computer systems. ML models often remember' the old data. Recent works on machine unlearning have not been able to completely solve the problem.
arXiv Detail & Related papers (2022-09-06T08:51:53Z)
RelaxLoss: Defending Membership Inference Attacks without Losing Utility [68.48117818874155]
We propose a novel training framework based on a relaxed loss with a more achievable learning target. RelaxLoss is applicable to any classification model with added benefits of easy implementation and negligible overhead. Our approach consistently outperforms state-of-the-art defense mechanisms in terms of resilience against MIAs.
arXiv Detail & Related papers (2022-07-12T19:34:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.