Related papers: Machine Unlearning in Forgettability Sequence

Machine Unlearning in Forgettability Sequence

URL: http://arxiv.org/abs/2410.06446v2
Date: Mon, 21 Oct 2024 14:28:18 GMT
Title: Machine Unlearning in Forgettability Sequence
Authors: Junjie Chen, Qian Chen, Jian Lou, Xiaoyu Zhang, Kai Wu, Zilong Wang,
Abstract summary: We identify key factor affecting unlearning difficulty and the performance of unlearning algorithms. We propose a general unlearning framework, dubbed RSU, which consists of Ranking module and SeqUnlearn module.
Score: 22.497699136603877
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Machine unlearning (MU) is becoming a promising paradigm to achieve the "right to be forgotten", where the training trace of any chosen data points could be eliminated, while maintaining the model utility on general testing samples after unlearning. With the advancement of forgetting research, many fundamental open questions remain unanswered: do different samples exhibit varying levels of difficulty in being forgotten? Further, does the sequence in which samples are forgotten, determined by their respective difficulty levels, influence the performance of forgetting algorithms? In this paper, we identify key factor affecting unlearning difficulty and the performance of unlearning algorithms. We find that samples with higher privacy risks are more likely to be unlearning, indicating that the unlearning difficulty varies among different samples which motives a more precise unlearning mode. Built upon this insight, we propose a general unlearning framework, dubbed RSU, which consists of Ranking module and SeqUnlearn module.

Related papers

FaLW: A Forgetting-aware Loss Reweighting for Long-tailed Unlearning [24.734154431191538]
FaLW is a plug-and-play, instance-wise dynamic loss reweighting method.<n>It assesses the unlearning state of each sample by comparing its predictive probability to the distribution of unseen data from the same class.<n>Experiments demonstrate that FaLW achieves superior performance.
arXiv Detail & Related papers (2026-01-26T16:21:01Z)
A Neuro-inspired Interpretation of Unlearning in Large Language Models through Sample-level Unlearning Difficulty [12.382999548648726]
Existing studies assume a uniform unlearning difficulty across samples. We propose a Memory Removal Difficulty ($mathrmMRD$) metric to quantify sample-level unlearning difficulty. We also propose an $mathrmMRD$-based weighted sampling method to optimize existing unlearning algorithms.
arXiv Detail & Related papers (2025-04-09T07:48:10Z)
Erasing Without Remembering: Implicit Knowledge Forgetting in Large Language Models [70.78205685001168]
We investigate knowledge forgetting in large language models with a focus on its generalisation.<n> UGBench is the first benchmark specifically designed to assess the unlearning of in-scope implicit knowledge.<n>We propose PerMU, a novel probability-based unlearning paradigm.
arXiv Detail & Related papers (2025-02-27T11:03:33Z)
CUAL: Continual Uncertainty-aware Active Learner [5.678185894553588]
A deployed AI agent is continuously provided with unlabeled data that may contain not only unseen samples of known classes but also samples from novel (unknown) classes. We present a comprehensive solution to this complex problem with our model "CUAL" (Continual Uncertainty-aware Active Learner) CUAL leverages an uncertainty estimation algorithm to prioritize active labeling of ambiguous (uncertain) predicted novel class samples while also simultaneously pseudo-labeling the most certain predictions of each class.
arXiv Detail & Related papers (2024-12-12T19:49:09Z)
Probably Approximately Precision and Recall Learning [62.912015491907994]
Precision and Recall are foundational metrics in machine learning. One-sided feedback--where only positive examples are observed during training--is inherent in many practical problems. We introduce a PAC learning framework where each hypothesis is represented by a graph, with edges indicating positive interactions.
arXiv Detail & Related papers (2024-11-20T04:21:07Z)
Towards Understanding the Feasibility of Machine Unlearning [14.177012256360635]
We present a set of novel metrics for quantifying the difficulty of unlearning. Specifically, we propose several metrics to assess the conditions necessary for a successful unlearning operation. We also present a ranking mechanism to identify the most challenging samples to unlearn.
arXiv Detail & Related papers (2024-10-03T23:41:42Z)
MUSE: Machine Unlearning Six-Way Evaluation for Language Models [109.76505405962783]
Language models (LMs) are trained on vast amounts of text data, which may include private and copyrighted content. We propose MUSE, a comprehensive machine unlearning evaluation benchmark. We benchmark how effectively eight popular unlearning algorithms can unlearn Harry Potter books and news articles.
arXiv Detail & Related papers (2024-07-08T23:47:29Z)
What makes unlearning hard and what to do about it [3.2140380913122195]
We identify two key factors affecting unlearning difficulty and the performance of unlearning algorithms. We develop a framework coined Refined-Unlearning Meta-algorithm (RUM) that encompasses: (i) refining the forget set into homogenized subsets, according to different characteristics; and (ii) a meta-algorithm that employs existing algorithms to unlearn each subset and finally delivers a model that has unlearned the overall forget set.
arXiv Detail & Related papers (2024-06-03T12:14:47Z)
Challenging Forgets: Unveiling the Worst-Case Forget Sets in Machine Unlearning [9.998859702421417]
Machine unlearning (MU) aims to eliminate the influence of chosen data points on model performance. Despite various MU methods for data influence erasure, evaluations have largely focused on random data forgetting. We propose identifying the data subset that presents the most significant challenge for influence erasure, pinpointing the worst-case forget set.
arXiv Detail & Related papers (2024-03-12T06:50:32Z)
Extrinsicaly Rewarded Soft Q Imitation Learning with Discriminator [0.0]
Supervised learning methods such as Behavioral Cloning do not require sampling data, but usually suffer from distribution shift. Soft Q imitation learning (SQIL) addressed the problems, and it was shown that it could learn efficiently by combining Behavioral Cloning and soft Q-learning with constant rewards.
arXiv Detail & Related papers (2024-01-30T06:22:19Z)
Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines [83.65380507372483]
Large pre-trained models can dramatically reduce the amount of task-specific data required to solve a problem, but they often fail to capture domain-specific nuances out of the box. This paper shows how to leverage recent advances in NLP and multi-modal learning to augment a pre-trained model with search engine retrieval.
arXiv Detail & Related papers (2023-11-29T05:33:28Z)
MILD: Modeling the Instance Learning Dynamics for Learning with Noisy Labels [19.650299232829546]
We propose an iterative selection approach based on the Weibull mixture model to identify clean data. In particular, we measure the difficulty of memorization and memorize for each instance via the transition times between being misclassified and being memorized. Our strategy outperforms existing noisy-label learning methods.
arXiv Detail & Related papers (2023-06-20T14:26:53Z)
HardVis: Visual Analytics to Handle Instance Hardness Using Undersampling and Oversampling Techniques [48.82319198853359]
HardVis is a visual analytics system designed to handle instance hardness mainly in imbalanced classification scenarios. Users can explore subsets of data from different perspectives to decide all those parameters. The efficacy and effectiveness of HardVis are demonstrated with a hypothetical usage scenario and a use case.
arXiv Detail & Related papers (2022-03-29T17:04:16Z)
A Simple Hash-Based Early Exiting Approach For Language Understanding and Generation [77.85086491395981]
Early exiting allows instances to exit at different layers according to the estimation of difficulty. We propose a Hash-based Early Exiting approach (HashEE) that replaces the learn-to-exit modules with hash functions to assign each token to a fixed exiting layer. Experimental results on classification, regression, and generation tasks demonstrate that HashEE can achieve higher performance with fewer FLOPs and inference time.
arXiv Detail & Related papers (2022-03-03T12:02:05Z)
When is Memorization of Irrelevant Training Data Necessary for High-Accuracy Learning? [53.523017945443115]
We describe natural prediction problems in which every sufficiently accurate training algorithm must encode, in the prediction model, essentially all the information about a large subset of its training examples. Our results do not depend on the training algorithm or the class of models used for learning.
arXiv Detail & Related papers (2020-12-11T15:25:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.