Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy and Research
- URL: http://arxiv.org/abs/2412.06966v2
- Date: Fri, 31 Oct 2025 20:04:48 GMT
- Title: Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy and Research
- Authors: A. Feder Cooper, Christopher A. Choquette-Choo, Miranda Bogen, Kevin Klyman, Matthew Jagielski, Katja Filippova, Ken Liu, Alexandra Chouldechova, Jamie Hayes, Yangsibo Huang, Eleni Triantafillou, Peter Kairouz, Nicole Elyse Mitchell, Niloofar Mireshghallah, Abigail Z. Jacobs, James Grimmelmann, Vitaly Shmatikov, Christopher De Sa, Ilia Shumailov, Andreas Terzis, Solon Barocas, Jennifer Wortman Vaughan, Danah Boyd, Yejin Choi, Sanmi Koyejo, Fernando Delgado, Percy Liang, Daniel E. Ho, Pamela Samuelson, Miles Brundage, David Bau, Seth Neel, Hanna Wallach, Amy B. Cyphert, Mark A. Lemley, Nicolas Papernot, Katherine Lee,
- Abstract summary: "Machine unlearning" is a proposed solution for mitigating the existence of content in an AI model that is problematic for legal or moral reasons.<n>We provide a framework for ML researchers and policymakers to think rigorously about these challenges.
- Score: 186.53450963176968
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: "Machine unlearning" is a popular proposed solution for mitigating the existence of content in an AI model that is problematic for legal or moral reasons, including privacy, copyright, safety, and more. For example, unlearning is often invoked as a solution for removing the effects of specific information from a generative-AI model's parameters, e.g., a particular individual's personal data or the inclusion of copyrighted content in the model's training data. Unlearning is also proposed as a way to prevent a model from generating targeted types of information in its outputs, e.g., generations that closely resemble a particular individual's data or reflect the concept of "Spiderman." Both of these goals--the targeted removal of information from a model and the targeted suppression of information from a model's outputs--present various technical and substantive challenges. We provide a framework for ML researchers and policymakers to think rigorously about these challenges, identifying several mismatches between the goals of unlearning and feasible implementations. These mismatches explain why unlearning is not a general-purpose solution for circumscribing generative-AI model behavior in service of broader positive impact.
Related papers
- Forgetting-MarI: LLM Unlearning via Marginal Information Regularization [6.979586479353831]
Existing unlearning methods often degrade model performance by removing more information than necessary when attempting to ''forget'' specific data.<n>We introduce Forgetting-MarI, an LLM unlearning framework that provably removes only the additional (marginal) information contributed by the data to be unlearned.<n>By penalizing marginal information, our method yields an explicit upper bound on the unlearn dataset's residual influence in the trained models, providing provable undetectability.
arXiv Detail & Related papers (2025-11-14T22:48:39Z) - Evaluating the Defense Potential of Machine Unlearning against Membership Inference Attacks [0.0]
Membership Inference Attacks (MIAs) enable adversaries to determine whether a specific data point was included in the training dataset of a model.<n>While Machine Unlearning is not inherently a countermeasure against MIA, the unlearning algorithm and data characteristics can significantly affect a model's vulnerability.<n>This work provides essential insights into the interplay between Machine Unlearning and MIAs, offering guidance for the design of privacy-preserving machine learning systems.
arXiv Detail & Related papers (2025-08-22T07:19:33Z) - Model Collapse Is Not a Bug but a Feature in Machine Unlearning for LLMs [54.167494079321465]
Current unlearning methods for LLMs optimize on the private information they seek to remove by incorporating it into their fine-tuning data.<n>We propose a novel unlearning method-Partial Model Collapse (PMC), which does not require unlearning targets in the unlearning objective.
arXiv Detail & Related papers (2025-07-06T03:08:49Z) - Differential Privacy in Machine Learning: From Symbolic AI to LLMs [49.1574468325115]
Differential privacy provides a formal framework to mitigate privacy risks.<n>It ensures that the inclusion or exclusion of any single data point does not significantly alter the output of an algorithm.
arXiv Detail & Related papers (2025-06-13T11:30:35Z) - Data-Free Model-Related Attacks: Unleashing the Potential of Generative AI [21.815149263785912]
We introduce the use of generative AI for facilitating model-related attacks, including model extraction, membership inference, and model inversion.
Our study reveals that adversaries can launch a variety of model-related attacks against both image and text models in a data-free and black-box manner.
This research serves as an important early warning to the community about the potential risks associated with generative AI-powered attacks on deep learning models.
arXiv Detail & Related papers (2025-01-28T03:12:57Z) - Game-Theoretic Machine Unlearning: Mitigating Extra Privacy Leakage [12.737028324709609]
Recent legislation obligates organizations to remove requested data and its influence from a trained model.
We propose a game-theoretic machine unlearning algorithm that simulates the competitive relationship between unlearning performance and privacy protection.
arXiv Detail & Related papers (2024-11-06T13:47:04Z) - Learn while Unlearn: An Iterative Unlearning Framework for Generative Language Models [49.043599241803825]
Iterative Contrastive Unlearning (ICU) framework consists of three core components.
A Knowledge Unlearning Induction module removes specific knowledge through an unlearning loss.
A Contrastive Learning Enhancement module to preserve the model's expressive capabilities against the pure unlearning goal.
And an Iterative Unlearning Refinement module that dynamically assess the unlearning extent on specific data pieces and make iterative update.
arXiv Detail & Related papers (2024-07-25T07:09:35Z) - Unlearning Targeted Information via Single Layer Unlearning Gradient [15.374381635334897]
Unauthorized privacy-related computation is a significant concern for society.
The EU's General Protection Regulation includes a "right to be forgotten"
We propose Single Layer Unlearning Gradient (SLUG) to unlearn targeted information by updating targeted layers of a model.
arXiv Detail & Related papers (2024-07-16T15:52:36Z) - Privacy Implications of Explainable AI in Data-Driven Systems [0.0]
Machine learning (ML) models suffer from a lack of interpretability.
The absence of transparency, often referred to as the black box nature of ML models, undermines trust.
XAI techniques address this challenge by providing frameworks and methods to explain the internal decision-making processes.
arXiv Detail & Related papers (2024-06-22T08:51:58Z) - The Frontier of Data Erasure: Machine Unlearning for Large Language Models [56.26002631481726]
Large Language Models (LLMs) are foundational to AI advancements.
LLMs pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information.
Machine unlearning emerges as a cutting-edge solution to mitigate these concerns.
arXiv Detail & Related papers (2024-03-23T09:26:15Z) - Self-Destructing Models: Increasing the Costs of Harmful Dual Uses of
Foundation Models [103.71308117592963]
We present an algorithm for training self-destructing models leveraging techniques from meta-learning and adversarial learning.
In a small-scale experiment, we show MLAC can largely prevent a BERT-style model from being re-purposed to perform gender identification.
arXiv Detail & Related papers (2022-11-27T21:43:45Z) - A Survey of Machine Unlearning [56.017968863854186]
Recent regulations now require that, on request, private information about a user must be removed from computer systems.
ML models often remember' the old data.
Recent works on machine unlearning have not been able to completely solve the problem.
arXiv Detail & Related papers (2022-09-06T08:51:53Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - Goal-Aware Prediction: Learning to Model What Matters [105.43098326577434]
One of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model and that of the downstream planner or policy.
We propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space.
We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.
arXiv Detail & Related papers (2020-07-14T16:42:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.