Does Machine Unlearning Truly Remove Model Knowledge? A Framework for Auditing Unlearning in LLMs
- URL: http://arxiv.org/abs/2505.23270v1
- Date: Thu, 29 May 2025 09:19:07 GMT
- Title: Does Machine Unlearning Truly Remove Model Knowledge? A Framework for Auditing Unlearning in LLMs
- Authors: Haokun Chen, Yueqi Zhang, Yuan Bi, Yao Zhang, Tong Liu, Jinhe Bi, Jian Lan, Jindong Gu, Claudia Grosser, Denis Krompass, Nassir Navab, Volker Tresp,
- Abstract summary: We introduce a comprehensive auditing framework for unlearning evaluation comprising three benchmark datasets, six unlearning algorithms, and five prompt-based auditing methods.<n>We evaluate the effectiveness and robustness of different unlearning strategies.
- Score: 58.24692529185971
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, Large Language Models (LLMs) have achieved remarkable advancements, drawing significant attention from the research community. Their capabilities are largely attributed to large-scale architectures, which require extensive training on massive datasets. However, such datasets often contain sensitive or copyrighted content sourced from the public internet, raising concerns about data privacy and ownership. Regulatory frameworks, such as the General Data Protection Regulation (GDPR), grant individuals the right to request the removal of such sensitive information. This has motivated the development of machine unlearning algorithms that aim to remove specific knowledge from models without the need for costly retraining. Despite these advancements, evaluating the efficacy of unlearning algorithms remains a challenge due to the inherent complexity and generative nature of LLMs. In this work, we introduce a comprehensive auditing framework for unlearning evaluation, comprising three benchmark datasets, six unlearning algorithms, and five prompt-based auditing methods. By using various auditing algorithms, we evaluate the effectiveness and robustness of different unlearning strategies. To explore alternatives beyond prompt-based auditing, we propose a novel technique that leverages intermediate activation perturbations, addressing the limitations of auditing methods that rely solely on model inputs and outputs.
Related papers
- A Comprehensive Survey of Machine Unlearning Techniques for Large Language Models [35.893819613585315]
This study investigates the machine unlearning techniques within the context of large language models (LLMs)<n>LLMs unlearning offers a principled approach to removing the influence of undesirable data from LLMs.<n>Despite growing research interest, there is no comprehensive survey that systematically organizes existing work and distills key insights.
arXiv Detail & Related papers (2025-02-22T12:46:14Z) - Online Continual Learning: A Systematic Literature Review of Approaches, Challenges, and Benchmarks [1.3631535881390204]
Online Continual Learning (OCL) is a critical area in machine learning.<n>This study conducts the first comprehensive Systematic Literature Review on OCL.
arXiv Detail & Related papers (2025-01-09T01:03:14Z) - RESTOR: Knowledge Recovery in Machine Unlearning [71.75834077528305]
Large language models trained on web-scale corpora can contain private or sensitive information.<n>Several machine unlearning algorithms have been proposed to eliminate the effect of such datapoints.<n>We propose the RESTOR framework for machine unlearning evaluation.
arXiv Detail & Related papers (2024-10-31T20:54:35Z) - SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation.
Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z) - Learn while Unlearn: An Iterative Unlearning Framework for Generative Language Models [52.03511469562013]
We introduce the Iterative Contrastive Unlearning (ICU) framework, which consists of three core components.<n>A Knowledge Unlearning Induction module targets specific knowledge for removal using an unlearning loss.<n>A Contrastive Learning Enhancement module preserves the model's expressive capabilities against the pure unlearning goal.<n>An Iterative Unlearning Refinement module dynamically adjusts the unlearning process through ongoing evaluation and updates.
arXiv Detail & Related papers (2024-07-25T07:09:35Z) - On Large Language Model Continual Unlearning [35.49718871265512]
Machine unlearning has emerged as a representative approach for model safety and security.<n>These methods do not sufficiently consider that unlearning requests in real-world scenarios are continuously emerging.<n>We propose an Orthogonal low-rank adapter (LoRA) for continually unlearning requested data and an Out-Of-Distribution detector to measure the similarity between input and unlearning data.
arXiv Detail & Related papers (2024-07-14T14:26:17Z) - The Frontier of Data Erasure: Machine Unlearning for Large Language Models [56.26002631481726]
Large Language Models (LLMs) are foundational to AI advancements.
LLMs pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information.
Machine unlearning emerges as a cutting-edge solution to mitigate these concerns.
arXiv Detail & Related papers (2024-03-23T09:26:15Z) - Towards Lifecycle Unlearning Commitment Management: Measuring Sample-level Approximate Unlearning Completeness [30.596695293390415]
We introduce the task of Lifecycle Unlearning Commitment Management (LUCM) for approximate unlearning.
We propose an efficient metric designed to assess the sample-level unlearning completeness.
We show that this metric is able to serve as a tool for monitoring unlearning anomalies throughout the unlearning lifecycle.
arXiv Detail & Related papers (2024-03-19T15:37:27Z) - The Efficiency Spectrum of Large Language Models: An Algorithmic Survey [54.19942426544731]
The rapid growth of Large Language Models (LLMs) has been a driving force in transforming various domains.
This paper examines the multi-faceted dimensions of efficiency essential for the end-to-end algorithmic development of LLMs.
arXiv Detail & Related papers (2023-12-01T16:00:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.