RESTOR: Knowledge Recovery through Machine Unlearning
- URL: http://arxiv.org/abs/2411.00204v2
- Date: Thu, 02 Jan 2025 20:36:44 GMT
- Title: RESTOR: Knowledge Recovery through Machine Unlearning
- Authors: Keivan Rezaei, Khyathi Chandu, Soheil Feizi, Yejin Choi, Faeze Brahman, Abhilasha Ravichander,
- Abstract summary: Large language models trained on web-scale corpora can memorize undesirable datapoints.
Many machine unlearning algorithms have been proposed that aim to erase' these datapoints.
We propose the RESTOR framework for machine unlearning, which evaluates the ability of unlearning algorithms to perform targeted data erasure.
- Score: 71.75834077528305
- License:
- Abstract: Large language models trained on web-scale corpora can memorize undesirable datapoints such as incorrect facts, copyrighted content or sensitive data. Recently, many machine unlearning algorithms have been proposed that aim to `erase' these datapoints from trained models -- that is, revert model behavior to be similar to a model that had never been trained on these datapoints. However, evaluating the success of unlearning algorithms remains an open challenge. In this work, we propose the RESTOR framework for machine unlearning, which evaluates the ability of unlearning algorithms to perform targeted data erasure from models, by evaluating the ability of models to forget the knowledge introduced in these data points, while simultaneously recovering the model's knowledge state had it not encountered these datapoints. RESTOR helps uncover several novel insights about popular unlearning algorithms, and the mechanisms through which they operate -- for instance, identifying that some algorithms merely emphasize forgetting, and that localizing unlearning targets can enhance unlearning performance.
Related papers
- Attribute-to-Delete: Machine Unlearning via Datamodel Matching [65.13151619119782]
Machine unlearning -- efficiently removing a small "forget set" training data on a pre-divertrained machine learning model -- has recently attracted interest.
Recent research shows that machine unlearning techniques do not hold up in such a challenging setting.
arXiv Detail & Related papers (2024-10-30T17:20:10Z) - Learn while Unlearn: An Iterative Unlearning Framework for Generative Language Models [49.043599241803825]
Iterative Contrastive Unlearning (ICU) framework consists of three core components.
A Knowledge Unlearning Induction module removes specific knowledge through an unlearning loss.
A Contrastive Learning Enhancement module to preserve the model's expressive capabilities against the pure unlearning goal.
And an Iterative Unlearning Refinement module that dynamically assess the unlearning extent on specific data pieces and make iterative update.
arXiv Detail & Related papers (2024-07-25T07:09:35Z) - Gone but Not Forgotten: Improved Benchmarks for Machine Unlearning [0.0]
We describe and propose alternative evaluation methods for machine unlearning algorithms.
We show the utility of our alternative evaluations via a series of experiments of state-of-the-art unlearning algorithms on different computer vision datasets.
arXiv Detail & Related papers (2024-05-29T15:53:23Z) - The Frontier of Data Erasure: Machine Unlearning for Large Language Models [56.26002631481726]
Large Language Models (LLMs) are foundational to AI advancements.
LLMs pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information.
Machine unlearning emerges as a cutting-edge solution to mitigate these concerns.
arXiv Detail & Related papers (2024-03-23T09:26:15Z) - An Information Theoretic Approach to Machine Unlearning [43.423418819707784]
To comply with AI and data regulations, the need to forget private or copyrighted information from trained machine learning models is increasingly important.
In this work, we address the zero-shot unlearning scenario, whereby an unlearning algorithm must be able to remove data given only a trained model and the data to be forgotten.
We derive a simple but principled zero-shot unlearning method based on the geometry of the model.
arXiv Detail & Related papers (2024-02-02T13:33:30Z) - Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning
Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning.
Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset.
We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU)
We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z) - Generative Adversarial Networks Unlearning [13.342749941357152]
Machine unlearning has emerged as a solution to erase training data from trained machine learning models.
Research on Generative Adversarial Networks (GANs) is limited due to their unique architecture, including a generator and a discriminator.
We propose a cascaded unlearning approach for both item and class unlearning within GAN models, in which the unlearning and learning processes run in a cascaded manner.
arXiv Detail & Related papers (2023-08-19T02:21:21Z) - On the Necessity of Auditable Algorithmic Definitions for Machine
Unlearning [13.149070833843133]
Machine unlearning, i.e. having a model forget about some of its training data, has become increasingly important as privacy legislation promotes variants of the right-to-be-forgotten.
We first show that the definition that underlies approximate unlearning, which seeks to prove the approximately unlearned model is close to an exactly retrained model, is incorrect because one can obtain the same model using different datasets.
We then turn to exact unlearning approaches and ask how to verify their claims of unlearning.
arXiv Detail & Related papers (2021-10-22T16:16:56Z) - Machine Unlearning of Features and Labels [72.81914952849334]
We propose first scenarios for unlearning and labels in machine learning models.
Our approach builds on the concept of influence functions and realizes unlearning through closed-form updates of model parameters.
arXiv Detail & Related papers (2021-08-26T04:42:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.