Related papers: Does Machine Unlearning Truly Remove Model Knowledge? A Framework for Auditing Unlearning in LLMs

Does Machine Unlearning Truly Remove Model Knowledge? A Framework for Auditing Unlearning in LLMs

URL: http://arxiv.org/abs/2505.23270v1
Date: Thu, 29 May 2025 09:19:07 GMT
Title: Does Machine Unlearning Truly Remove Model Knowledge? A Framework for Auditing Unlearning in LLMs
Authors: Haokun Chen, Yueqi Zhang, Yuan Bi, Yao Zhang, Tong Liu, Jinhe Bi, Jian Lan, Jindong Gu, Claudia Grosser, Denis Krompass, Nassir Navab, Volker Tresp,
Abstract summary: We introduce a comprehensive auditing framework for unlearning evaluation comprising three benchmark datasets, six unlearning algorithms, and five prompt-based auditing methods.<n>We evaluate the effectiveness and robustness of different unlearning strategies.
Score: 58.24692529185971
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In recent years, Large Language Models (LLMs) have achieved remarkable advancements, drawing significant attention from the research community. Their capabilities are largely attributed to large-scale architectures, which require extensive training on massive datasets. However, such datasets often contain sensitive or copyrighted content sourced from the public internet, raising concerns about data privacy and ownership. Regulatory frameworks, such as the General Data Protection Regulation (GDPR), grant individuals the right to request the removal of such sensitive information. This has motivated the development of machine unlearning algorithms that aim to remove specific knowledge from models without the need for costly retraining. Despite these advancements, evaluating the efficacy of unlearning algorithms remains a challenge due to the inherent complexity and generative nature of LLMs. In this work, we introduce a comprehensive auditing framework for unlearning evaluation, comprising three benchmark datasets, six unlearning algorithms, and five prompt-based auditing methods. By using various auditing algorithms, we evaluate the effectiveness and robustness of different unlearning strategies. To explore alternatives beyond prompt-based auditing, we propose a novel technique that leverages intermediate activation perturbations, addressing the limitations of auditing methods that rely solely on model inputs and outputs.

Related papers

A Survey on Unlearning in Large Language Models [18.262778815699345]
Large Language Models (LLMs) have revolutionized natural language processing, yet their training on massive corpora poses significant risks.<n>To mitigate these issues and align with legal and ethical standards such as the "right to be forgotten", machine unlearning has emerged as a critical technique.<n>This survey provides a systematic review of over 180 papers on LLM unlearning published since 2021, focusing exclusively on large-scale generative models.
arXiv Detail & Related papers (2025-10-29T02:34:17Z)
SoK: Machine Unlearning for Large Language Models [14.88062383081161]
Large language model (LLM) unlearning has become a critical topic in machine learning.<n>We propose a new taxonomy based on the intention of unlearning.
arXiv Detail & Related papers (2025-06-10T20:30:39Z)
A Comprehensive Survey of Machine Unlearning Techniques for Large Language Models [35.893819613585315]
This study investigates the machine unlearning techniques within the context of large language models (LLMs)<n>LLMs unlearning offers a principled approach to removing the influence of undesirable data from LLMs.<n>Despite growing research interest, there is no comprehensive survey that systematically organizes existing work and distills key insights.
arXiv Detail & Related papers (2025-02-22T12:46:14Z)
Online Continual Learning: A Systematic Literature Review of Approaches, Challenges, and Benchmarks [1.3631535881390204]
Online Continual Learning (OCL) is a critical area in machine learning.<n>This study conducts the first comprehensive Systematic Literature Review on OCL.
arXiv Detail & Related papers (2025-01-09T01:03:14Z)
RESTOR: Knowledge Recovery in Machine Unlearning [71.75834077528305]
Large language models trained on web-scale corpora can contain private or sensitive information.<n>Several machine unlearning algorithms have been proposed to eliminate the effect of such datapoints.<n>We propose the RESTOR framework for machine unlearning evaluation.
arXiv Detail & Related papers (2024-10-31T20:54:35Z)
SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation. Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z)
Learn while Unlearn: An Iterative Unlearning Framework for Generative Language Models [52.03511469562013]
We introduce the Iterative Contrastive Unlearning (ICU) framework, which consists of three core components.<n>A Knowledge Unlearning Induction module targets specific knowledge for removal using an unlearning loss.<n>A Contrastive Learning Enhancement module preserves the model's expressive capabilities against the pure unlearning goal.<n>An Iterative Unlearning Refinement module dynamically adjusts the unlearning process through ongoing evaluation and updates.
arXiv Detail & Related papers (2024-07-25T07:09:35Z)
On Large Language Model Continual Unlearning [35.49718871265512]
Machine unlearning has emerged as a representative approach for model safety and security.<n>These methods do not sufficiently consider that unlearning requests in real-world scenarios are continuously emerging.<n>We propose an Orthogonal low-rank adapter (LoRA) for continually unlearning requested data and an Out-Of-Distribution detector to measure the similarity between input and unlearning data.
arXiv Detail & Related papers (2024-07-14T14:26:17Z)
Unveiling Entity-Level Unlearning for Large Language Models: A Comprehensive Analysis [32.455702022397666]
Large language model unlearning has garnered increasing attention due to its potential to address security and privacy concerns.<n>Much of this research has concentrated on instance-level unlearning, specifically targeting the removal of predefined instances containing sensitive content.<n>We propose a novel task of entity-level unlearning, which aims to erase entity-related knowledge from the target model completely.
arXiv Detail & Related papers (2024-06-22T09:40:07Z)
The Frontier of Data Erasure: Machine Unlearning for Large Language Models [56.26002631481726]
Large Language Models (LLMs) are foundational to AI advancements. LLMs pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information. Machine unlearning emerges as a cutting-edge solution to mitigate these concerns.
arXiv Detail & Related papers (2024-03-23T09:26:15Z)
Towards Lifecycle Unlearning Commitment Management: Measuring Sample-level Approximate Unlearning Completeness [30.596695293390415]
We introduce the task of Lifecycle Unlearning Commitment Management (LUCM) for approximate unlearning. We propose an efficient metric designed to assess the sample-level unlearning completeness. We show that this metric is able to serve as a tool for monitoring unlearning anomalies throughout the unlearning lifecycle.
arXiv Detail & Related papers (2024-03-19T15:37:27Z)
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey [54.19942426544731]
The rapid growth of Large Language Models (LLMs) has been a driving force in transforming various domains. This paper examines the multi-faceted dimensions of efficiency essential for the end-to-end algorithmic development of LLMs.
arXiv Detail & Related papers (2023-12-01T16:00:25Z)
Human-Centric Multimodal Machine Learning: Recent Advances and Testbed on AI-based Recruitment [66.91538273487379]
There is a certain consensus about the need to develop AI applications with a Human-Centric approach. Human-Centric Machine Learning needs to be developed based on four main requirements: (i) utility and social good; (ii) privacy and data ownership; (iii) transparency and accountability; and (iv) fairness in AI-driven decision-making processes. We study how current multimodal algorithms based on heterogeneous sources of information are affected by sensitive elements and inner biases in the data.
arXiv Detail & Related papers (2023-02-13T16:44:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.