Related papers: Towards Effective Evaluations and Comparisons for LLM Unlearning Methods

Related papers

Reference-Specific Unlearning Metrics Can Hide the Truth: A Reality Check [60.77691669644931]
We propose Functional Alignment for Distributional Equivalence (FADE), a novel metric that measures distributional similarity between unlearned and reference models.<n>We show that FADE captures functional alignment across the entire output distribution, providing a principled assessment of genuine unlearning.<n>These findings expose fundamental gaps in current evaluation practices and demonstrate that FADE provides a more robust foundation for developing and assessing truly effective unlearning methods.
arXiv Detail & Related papers (2025-10-14T20:50:30Z)
LLM Unlearning Under the Microscope: A Full-Stack View on Methods and Metrics [10.638045151201084]
We present a principled taxonomy of twelve recent stateful unlearning methods.<n>We revisit the evaluation of unlearning effectiveness (UE), utility retention (UT), and robustness (Rob)<n>Our analysis shows that current evaluations, dominated by multiple-choice question (MCQ) accuracy, offer only a narrow perspective.
arXiv Detail & Related papers (2025-10-08T23:47:05Z)
Unlearning That Lasts: Utility-Preserving, Robust, and Almost Irreversible Forgetting in LLMs [31.768387661474904]
Unlearning in large language models (LLMs) involves precisely removing specific information from a pre-trained model.<n>This is crucial to ensure safety of LLMs by deleting private data or harmful knowledge acquired during pre-training.<n>We introduce JensUn, where we leverage the Jensen-Shannon Divergence as the training objective for both forget and retain sets.<n>In extensive experiments, JensUn achieves better forget-utility trade-off than competing methods, and even demonstrates strong resilience to benign relearning.
arXiv Detail & Related papers (2025-09-02T20:38:53Z)
Rectifying Privacy and Efficacy Measurements in Machine Unlearning: A New Inference Attack Perspective [42.003102851493885]
We propose RULI (Rectified Unlearning Evaluation Framework via Likelihood Inference) to address critical gaps in the evaluation of inexact unlearning methods.<n>RULI introduces a dual-objective attack to measure both unlearning efficacy and privacy risks at a per-sample granularity.<n>Our findings reveal significant vulnerabilities in state-of-the-art unlearning methods, exposing privacy risks underestimated by existing methods.
arXiv Detail & Related papers (2025-06-16T00:30:02Z)
OpenUnlearning: Accelerating LLM Unlearning via Unified Benchmarking of Methods and Metrics [101.78963920333342]
We introduce OpenUnlearning, a standardized framework for benchmarking large language models (LLMs) unlearning methods and metrics.<n>OpenUnlearning integrates 9 unlearning algorithms and 16 diverse evaluations across 3 leading benchmarks.<n>We also benchmark diverse unlearning methods and provide a comparative analysis against an extensive evaluation suite.
arXiv Detail & Related papers (2025-06-14T20:16:37Z)
Towards Lifecycle Unlearning Commitment Management: Measuring Sample-level Unlearning Completeness [30.596695293390415]
Interpolated Approximate Measurement (IAM) is a framework designed for unlearning inference.<n>IAM quantifies sample-level unlearning completeness by interpolating the model's generalization-fitting behavior gap on queried samples.<n>We apply IAM to recent approximate unlearning algorithms, revealing general risks of both over-unlearning and under-unlearning.
arXiv Detail & Related papers (2025-06-06T14:22:18Z)
Existing Large Language Model Unlearning Evaluations Are Inconclusive [105.55899615056573]
We show that some evaluations introduce substantial new information into the model, potentially masking true unlearning performance.<n>We demonstrate that evaluation outcomes vary significantly across tasks, undermining the generalizability of current evaluation routines.<n>We propose two principles for future unlearning evaluations: minimal information injection and downstream task awareness.
arXiv Detail & Related papers (2025-05-31T19:43:00Z)
Exploring Criteria of Loss Reweighting to Enhance LLM Unlearning [66.8042627609456]
Loss reweighting has shown significant benefits for machine unlearning with large language models (LLMs)<n>In this paper, we identify two distinct goals of loss reweighting, namely, Saturation and Importance.<n>We propose SatImp, a simple reweighting method that combines the advantages of both saturation and importance.
arXiv Detail & Related papers (2025-05-17T10:41:22Z)
MUBox: A Critical Evaluation Framework of Deep Machine Unlearning [13.186439491394474]
MUBox is a comprehensive platform designed to evaluate unlearning methods in deep learning.<n> MUBox integrates 23 advanced unlearning techniques, tested across six practical scenarios with 11 diverse evaluation metrics.
arXiv Detail & Related papers (2025-05-13T13:50:51Z)
Beyond Single-Value Metrics: Evaluating and Enhancing LLM Unlearning with Cognitive Diagnosis [34.62178125699054]
UNCD (UNlearning evaluation via Cognitive Diagnosis) is a novel framework for fine-grained evaluation of LLM unlearning. Our benchmark, UNCD-Cyber, provides a detailed assessment of the removal of dangerous capabilities. Our dedicated benchmark, UNCD-Cyber, provides a detailed assessment of the removal of dangerous capabilities.
arXiv Detail & Related papers (2025-02-19T06:56:59Z)
Redefining Machine Unlearning: A Conformal Prediction-Motivated Approach [1.3731623617634434]
We identify critical limitations in existing unlearning metrics and propose enhanced evaluation metrics inspired by conformal prediction. Our metrics can effectively capture the extent to which ground truth labels are excluded from the prediction set. We propose an unlearning framework that integrates conformal prediction insights into Carlini & Wagner adversarial attack loss.
arXiv Detail & Related papers (2025-01-31T18:58:43Z)
Does Unlearning Truly Unlearn? A Black Box Evaluation of LLM Unlearning Methods [1.9799527196428242]
Large language model unlearning aims to remove harmful information that LLMs have learnt to prevent their use for malicious purposes. LMU and RMU have been proposed as two methods for LLM unlearning, achieving impressive results on unlearning benchmarks.
arXiv Detail & Related papers (2024-11-18T22:31:17Z)
Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset [92.99416966226724]
We introduce Facial Identity Unlearning Benchmark (FIUBench), a novel VLM unlearning benchmark designed to robustly evaluate the effectiveness of unlearning algorithms. We apply a two-stage evaluation pipeline that is designed to precisely control the sources of information and their exposure levels. Through the evaluation of four baseline VLM unlearning algorithms within FIUBench, we find that all methods remain limited in their unlearning performance.
arXiv Detail & Related papers (2024-11-05T23:26:10Z)
A Closer Look at Machine Unlearning for Large Language Models [46.245404272612795]
Large language models (LLMs) may memorize sensitive or copyrighted content, raising privacy and legal concerns. We discuss several issues in machine unlearning for LLMs and provide our insights on possible approaches.
arXiv Detail & Related papers (2024-10-10T16:56:05Z)
Position: LLM Unlearning Benchmarks are Weak Measures of Progress [31.957968729934745]
We find that existing benchmarks provide an overly optimistic and potentially misleading view on the effectiveness of candidate unlearning methods. We identify that existing benchmarks are particularly vulnerable to modifications that introduce even loose dependencies between the forget and retain information.
arXiv Detail & Related papers (2024-10-03T18:07:25Z)
Towards Robust Knowledge Unlearning: An Adversarial Framework for Assessing and Improving Unlearning Robustness in Large Language Models [19.015202590038996]
We design Dynamic Unlearning Attack (DUA), a dynamic and automated framework to attack unlearned models. We propose Latent Adrial Unlearning (LAU), a universal framework that effectively enhances the robustness of the unlearned process. We demonstrate that LAU improves unlearning effectiveness by over $53.5%$, cause only less than a $11.6%$ reduction in neighboring knowledge, and have almost no impact on the model's general capabilities.
arXiv Detail & Related papers (2024-08-20T09:36:04Z)
Machine Unlearning with Minimal Gradient Dependence for High Unlearning Ratios [18.73206066109299]
Mini-Unlearning is a novel approach that capitalizes on a critical observation: unlearned parameters correlate with retrained parameters through contraction mapping. This lightweight, scalable method significantly enhances model accuracy and strengthens resistance to membership inference attacks. Our experiments demonstrate that Mini-Unlearning not only works under higher unlearning ratios but also outperforms existing techniques in both accuracy and security.
arXiv Detail & Related papers (2024-06-24T01:43:30Z)
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data [102.16105233826917]
Learning from preference labels plays a crucial role in fine-tuning large language models. There are several distinct approaches for preference fine-tuning, including supervised learning, on-policy reinforcement learning (RL), and contrastive learning.
arXiv Detail & Related papers (2024-04-22T17:20:18Z)
Exploring Federated Unlearning: Analysis, Comparison, and Insights [101.64910079905566]
federated unlearning enables the selective removal of data from models trained in federated systems. This paper examines existing federated unlearning approaches, examining their algorithmic efficiency, impact on model accuracy, and effectiveness in preserving privacy. We propose the OpenFederatedUnlearning framework, a unified benchmark for evaluating federated unlearning methods.
arXiv Detail & Related papers (2023-10-30T01:34:33Z)
Model Sparsity Can Simplify Machine Unlearning [33.18951938708467]
In response to recent data regulation requirements, machine unlearning (MU) has emerged as a critical process. Our study introduces a novel model-based perspective: model sparsification via weight pruning. We show in both theory and practice that model sparsity can boost the multi-criteria unlearning performance of an approximate unlearner.
arXiv Detail & Related papers (2023-04-11T02:12:02Z)
Uncertainty Estimation by Fisher Information-based Evidential Deep Learning [61.94125052118442]
Uncertainty estimation is a key factor that makes deep learning reliable in practical applications. We propose a novel method, Fisher Information-based Evidential Deep Learning ($mathcalI$-EDL) In particular, we introduce Fisher Information Matrix (FIM) to measure the informativeness of evidence carried by each sample, according to which we can dynamically reweight the objective loss terms to make the network more focused on the representation learning of uncertain classes.
arXiv Detail & Related papers (2023-03-03T16:12:59Z)
Offline Reinforcement Learning with Instrumental Variables in Confounded Markov Decision Processes [93.61202366677526]
We study the offline reinforcement learning (RL) in the face of unmeasured confounders. We propose various policy learning methods with the finite-sample suboptimality guarantee of finding the optimal in-class policy.
arXiv Detail & Related papers (2022-09-18T22:03:55Z)
Imitating, Fast and Slow: Robust learning from demonstrations via decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning. We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z)
Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks. This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z)
DEALIO: Data-Efficient Adversarial Learning for Imitation from Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator. Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms. This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk. We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z)
On Data Efficiency of Meta-learning [17.739215706060605]
We study the often overlooked aspect of the modern meta-learning algorithms -- their data efficiency. We introduce a new simple framework for evaluating meta-learning methods under a limit on the available supervision. We propose active meta-learning, which incorporates active data selection into learning-to-learn, leading to better performance of all methods in the limited supervision regime.
arXiv Detail & Related papers (2021-01-30T01:44:12Z)
Learning the Truth From Only One Side of the Story [58.65439277460011]
We focus on generalized linear models and show that without adjusting for this sampling bias, the model may converge suboptimally or even fail to converge to the optimal solution. We propose an adaptive approach that comes with theoretical guarantees and show that it outperforms several existing methods empirically.
arXiv Detail & Related papers (2020-06-08T18:20:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.