Machine Unlearning in Forgettability Sequence
- URL: http://arxiv.org/abs/2410.06446v2
- Date: Mon, 21 Oct 2024 14:28:18 GMT
- Title: Machine Unlearning in Forgettability Sequence
- Authors: Junjie Chen, Qian Chen, Jian Lou, Xiaoyu Zhang, Kai Wu, Zilong Wang,
- Abstract summary: We identify key factor affecting unlearning difficulty and the performance of unlearning algorithms.
We propose a general unlearning framework, dubbed RSU, which consists of Ranking module and SeqUnlearn module.
- Score: 22.497699136603877
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine unlearning (MU) is becoming a promising paradigm to achieve the "right to be forgotten", where the training trace of any chosen data points could be eliminated, while maintaining the model utility on general testing samples after unlearning. With the advancement of forgetting research, many fundamental open questions remain unanswered: do different samples exhibit varying levels of difficulty in being forgotten? Further, does the sequence in which samples are forgotten, determined by their respective difficulty levels, influence the performance of forgetting algorithms? In this paper, we identify key factor affecting unlearning difficulty and the performance of unlearning algorithms. We find that samples with higher privacy risks are more likely to be unlearning, indicating that the unlearning difficulty varies among different samples which motives a more precise unlearning mode. Built upon this insight, we propose a general unlearning framework, dubbed RSU, which consists of Ranking module and SeqUnlearn module.
Related papers
- Towards Understanding the Feasibility of Machine Unlearning [14.177012256360635]
We present a set of novel metrics for quantifying the difficulty of unlearning.
Specifically, we propose several metrics to assess the conditions necessary for a successful unlearning operation.
We also present a ranking mechanism to identify the most challenging samples to unlearn.
arXiv Detail & Related papers (2024-10-03T23:41:42Z) - MUSE: Machine Unlearning Six-Way Evaluation for Language Models [109.76505405962783]
Language models (LMs) are trained on vast amounts of text data, which may include private and copyrighted content.
We propose MUSE, a comprehensive machine unlearning evaluation benchmark.
We benchmark how effectively eight popular unlearning algorithms can unlearn Harry Potter books and news articles.
arXiv Detail & Related papers (2024-07-08T23:47:29Z) - What makes unlearning hard and what to do about it [3.2140380913122195]
We identify two key factors affecting unlearning difficulty and the performance of unlearning algorithms.
We develop a framework coined Refined-Unlearning Meta-algorithm (RUM) that encompasses: (i) refining the forget set into homogenized subsets, according to different characteristics; and (ii) a meta-algorithm that employs existing algorithms to unlearn each subset and finally delivers a model that has unlearned the overall forget set.
arXiv Detail & Related papers (2024-06-03T12:14:47Z) - Challenging Forgets: Unveiling the Worst-Case Forget Sets in Machine Unlearning [9.998859702421417]
Machine unlearning (MU) aims to eliminate the influence of chosen data points on model performance.
Despite various MU methods for data influence erasure, evaluations have largely focused on random data forgetting.
We propose identifying the data subset that presents the most significant challenge for influence erasure, pinpointing the worst-case forget set.
arXiv Detail & Related papers (2024-03-12T06:50:32Z) - Extrinsicaly Rewarded Soft Q Imitation Learning with Discriminator [0.0]
Supervised learning methods such as Behavioral Cloning do not require sampling data, but usually suffer from distribution shift.
Soft Q imitation learning (SQIL) addressed the problems, and it was shown that it could learn efficiently by combining Behavioral Cloning and soft Q-learning with constant rewards.
arXiv Detail & Related papers (2024-01-30T06:22:19Z) - Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines [83.65380507372483]
Large pre-trained models can dramatically reduce the amount of task-specific data required to solve a problem, but they often fail to capture domain-specific nuances out of the box.
This paper shows how to leverage recent advances in NLP and multi-modal learning to augment a pre-trained model with search engine retrieval.
arXiv Detail & Related papers (2023-11-29T05:33:28Z) - MILD: Modeling the Instance Learning Dynamics for Learning with Noisy
Labels [19.650299232829546]
We propose an iterative selection approach based on the Weibull mixture model to identify clean data.
In particular, we measure the difficulty of memorization and memorize for each instance via the transition times between being misclassified and being memorized.
Our strategy outperforms existing noisy-label learning methods.
arXiv Detail & Related papers (2023-06-20T14:26:53Z) - HardVis: Visual Analytics to Handle Instance Hardness Using Undersampling and Oversampling Techniques [48.82319198853359]
HardVis is a visual analytics system designed to handle instance hardness mainly in imbalanced classification scenarios.
Users can explore subsets of data from different perspectives to decide all those parameters.
The efficacy and effectiveness of HardVis are demonstrated with a hypothetical usage scenario and a use case.
arXiv Detail & Related papers (2022-03-29T17:04:16Z) - A Simple Hash-Based Early Exiting Approach For Language Understanding
and Generation [77.85086491395981]
Early exiting allows instances to exit at different layers according to the estimation of difficulty.
We propose a Hash-based Early Exiting approach (HashEE) that replaces the learn-to-exit modules with hash functions to assign each token to a fixed exiting layer.
Experimental results on classification, regression, and generation tasks demonstrate that HashEE can achieve higher performance with fewer FLOPs and inference time.
arXiv Detail & Related papers (2022-03-03T12:02:05Z) - Low-Regret Active learning [64.36270166907788]
We develop an online learning algorithm for identifying unlabeled data points that are most informative for training.
At the core of our work is an efficient algorithm for sleeping experts that is tailored to achieve low regret on predictable (easy) instances.
arXiv Detail & Related papers (2021-04-06T22:53:45Z) - When is Memorization of Irrelevant Training Data Necessary for
High-Accuracy Learning? [53.523017945443115]
We describe natural prediction problems in which every sufficiently accurate training algorithm must encode, in the prediction model, essentially all the information about a large subset of its training examples.
Our results do not depend on the training algorithm or the class of models used for learning.
arXiv Detail & Related papers (2020-12-11T15:25:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.