Mirror Mirror on the Wall, Have I Forgotten it All? A New Framework for Evaluating Machine Unlearning
- URL: http://arxiv.org/abs/2505.08138v1
- Date: Tue, 13 May 2025 00:23:17 GMT
- Title: Mirror Mirror on the Wall, Have I Forgotten it All? A New Framework for Evaluating Machine Unlearning
- Authors: Brennon Brimhall, Philip Mathew, Neil Fendley, Yinzhi Cao, Matthew Green,
- Abstract summary: Machine unlearning methods take a model trained on a dataset and a forget set, then attempt to produce a model as if it had only been trained on the examples not in the forget set.<n>We show that an adversary is able to distinguish between a mirror model and a model produced by an unlearning method.<n>We propose a strong formal definition for machine unlearning called computational unlearning.
- Score: 15.37608643831483
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine unlearning methods take a model trained on a dataset and a forget set, then attempt to produce a model as if it had only been trained on the examples not in the forget set. We empirically show that an adversary is able to distinguish between a mirror model (a control model produced by retraining without the data to forget) and a model produced by an unlearning method across representative unlearning methods from the literature. We build distinguishing algorithms based on evaluation scores in the literature (i.e. membership inference scores) and Kullback-Leibler divergence. We propose a strong formal definition for machine unlearning called computational unlearning. Computational unlearning is defined as the inability for an adversary to distinguish between a mirror model and a model produced by an unlearning method. If the adversary cannot guess better than random (except with negligible probability), then we say that an unlearning method achieves computational unlearning. Our computational unlearning definition provides theoretical structure to prove unlearning feasibility results. For example, our computational unlearning definition immediately implies that there are no deterministic computational unlearning methods for entropic learning algorithms. We also explore the relationship between differential privacy (DP)-based unlearning methods and computational unlearning, showing that DP-based approaches can satisfy computational unlearning at the cost of an extreme utility collapse. These results demonstrate that current methodology in the literature fundamentally falls short of achieving computational unlearning. We conclude by identifying several open questions for future work.
Related papers
- Efficient Machine Unlearning via Influence Approximation [75.31015485113993]
Influence-based unlearning has emerged as a prominent approach to estimate the impact of individual training samples on model parameters without retraining.<n>This paper establishes a theoretical link between memorizing (incremental learning) and forgetting (unlearning)<n>We introduce the Influence Approximation Unlearning algorithm for efficient machine unlearning from the incremental perspective.
arXiv Detail & Related papers (2025-07-31T05:34:27Z) - RESTOR: Knowledge Recovery through Machine Unlearning [71.75834077528305]
Large language models trained on web-scale corpora can memorize undesirable datapoints.<n>Many machine unlearning algorithms have been proposed that aim to erase' these datapoints.<n>We propose the RESTOR framework for machine unlearning, which evaluates the ability of unlearning algorithms to perform targeted data erasure.
arXiv Detail & Related papers (2024-10-31T20:54:35Z) - Attribute-to-Delete: Machine Unlearning via Datamodel Matching [65.13151619119782]
Machine unlearning -- efficiently removing a small "forget set" training data on a pre-divertrained machine learning model -- has recently attracted interest.
Recent research shows that machine unlearning techniques do not hold up in such a challenging setting.
arXiv Detail & Related papers (2024-10-30T17:20:10Z) - UnUnlearning: Unlearning is not sufficient for content regulation in advanced generative AI [50.61495097098296]
We revisit the paradigm in which unlearning is used for Large Language Models (LLMs)
We introduce a concept of ununlearning, where unlearned knowledge gets reintroduced in-context.
We argue that content filtering for impermissible knowledge will be required and even exact unlearning schemes are not enough for effective content regulation.
arXiv Detail & Related papers (2024-06-27T10:24:35Z) - Adversarial Machine Unlearning [26.809123658470693]
This paper focuses on the challenge of machine unlearning, aiming to remove the influence of specific training data on machine learning models.
Traditionally, the development of unlearning algorithms runs parallel with that of membership inference attacks (MIA), a type of privacy threat.
We propose a game-theoretic framework that integrates MIAs into the design of unlearning algorithms.
arXiv Detail & Related papers (2024-06-11T20:07:22Z) - An Information Theoretic Approach to Machine Unlearning [43.423418819707784]
To comply with AI and data regulations, the need to forget private or copyrighted information from trained machine learning models is increasingly important.<n>In this work, we address the zero-shot unlearning scenario, whereby an unlearning algorithm must be able to remove data given only a trained model and the data to be forgotten.<n>We derive a simple but principled zero-shot unlearning method based on the geometry of the model.
arXiv Detail & Related papers (2024-02-02T13:33:30Z) - A Probabilistic Model Behind Self-Supervised Learning [53.64989127914936]
In self-supervised learning (SSL), representations are learned via an auxiliary task without annotated labels.
We present a generative latent variable model for self-supervised learning.
We show that several families of discriminative SSL, including contrastive methods, induce a comparable distribution over representations.
arXiv Detail & Related papers (2024-02-02T13:31:17Z) - Ticketed Learning-Unlearning Schemes [57.89421552780526]
We propose a new ticketed model for learning--unlearning.
We provide space-efficient ticketed learning--unlearning schemes for a broad family of concept classes.
arXiv Detail & Related papers (2023-06-27T18:54:40Z) - Model Sparsity Can Simplify Machine Unlearning [33.18951938708467]
In response to recent data regulation requirements, machine unlearning (MU) has emerged as a critical process.
Our study introduces a novel model-based perspective: model sparsification via weight pruning.
We show in both theory and practice that model sparsity can boost the multi-criteria unlearning performance of an approximate unlearner.
arXiv Detail & Related papers (2023-04-11T02:12:02Z) - On the Necessity of Auditable Algorithmic Definitions for Machine
Unlearning [13.149070833843133]
Machine unlearning, i.e. having a model forget about some of its training data, has become increasingly important as privacy legislation promotes variants of the right-to-be-forgotten.
We first show that the definition that underlies approximate unlearning, which seeks to prove the approximately unlearned model is close to an exactly retrained model, is incorrect because one can obtain the same model using different datasets.
We then turn to exact unlearning approaches and ask how to verify their claims of unlearning.
arXiv Detail & Related papers (2021-10-22T16:16:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.