UPCORE: Utility-Preserving Coreset Selection for Balanced Unlearning
- URL: http://arxiv.org/abs/2502.15082v1
- Date: Thu, 20 Feb 2025 22:51:10 GMT
- Title: UPCORE: Utility-Preserving Coreset Selection for Balanced Unlearning
- Authors: Vaidehi Patil, Elias Stengel-Eskin, Mohit Bansal,
- Abstract summary: User specifications or legal frameworks often require information to be removed from pretrained models, including large language models (LLMs)<n>This requires deleting or "forgetting" a set of data points from an already-trained model, which typically degrades its performance on other data points.<n>We propose UPCORE, a method-agnostic data selection framework for mitigating collateral damage during unlearning.
- Score: 57.081646768835704
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: User specifications or legal frameworks often require information to be removed from pretrained models, including large language models (LLMs). This requires deleting or "forgetting" a set of data points from an already-trained model, which typically degrades its performance on other data points. Thus, a balance must be struck between removing information and keeping the model's other abilities intact, with a failure to balance this trade-off leading to poor deletion or an unusable model. To this end, we propose UPCORE (Utility-Preserving Coreset Selection), a method-agnostic data selection framework for mitigating collateral damage during unlearning. Finding that the model damage is correlated with the variance of the model's representations on the forget set, we selectively prune the forget set to remove outliers, thereby minimizing model degradation after unlearning. We evaluate UPCORE across three standard unlearning methods consistently achieving a superior balance between the competing objectives of deletion efficacy and model preservation. To better evaluate this trade-off, we introduce a new metric, measuring the area-under-the-curve (AUC) across standard metrics. We find that UPCORE improves both standard metrics and AUC, benefitting from positive transfer between the coreset and pruned points while reducing negative transfer from the forget set to points outside of it.
Related papers
- AMUN: Adversarial Machine UNlearning [13.776549741449557]
Adversarial Machine UNlearning (AMUN) outperforms prior state-of-the-art (SOTA) methods for image classification.
AMUN lowers the confidence of the model on the forget samples by fine-tuning the model on their corresponding adversarial examples.
arXiv Detail & Related papers (2025-03-02T14:36:31Z) - Machine Unlearning on Pre-trained Models by Residual Feature Alignment Using LoRA [15.542668474378633]
We propose a novel and efficient machine unlearning method on pre-trained models.
We leverage LoRA to decompose the model's intermediate features into pre-trained features and residual features.
The method aims to learn the zero residuals on the retained set and shifted residuals on the unlearning set.
arXiv Detail & Related papers (2024-11-13T08:56:35Z) - Alternate Preference Optimization for Unlearning Factual Knowledge in Large Language Models [2.0962367975513496]
Machine unlearning aims to efficiently eliminate the influence of specific training data, known as the forget set, from the model.<n>Existing unlearning methods rely solely on negative feedback to suppress responses related to the forget set.<n>We propose a novel approach called Alternate Preference Optimization (AltPO), which combines negative feedback with in-domain positive feedback on the forget set.
arXiv Detail & Related papers (2024-09-20T13:05:07Z) - Partially Blinded Unlearning: Class Unlearning for Deep Networks a Bayesian Perspective [4.31734012105466]
Machine Unlearning is the process of selectively discarding information designated to specific sets or classes of data from a pre-trained model.
We propose a methodology tailored for the purposeful elimination of information linked to a specific class of data from a pre-trained classification network.
Our novel approach, termed textbfPartially-Blinded Unlearning (PBU), surpasses existing state-of-the-art class unlearning methods, demonstrating superior effectiveness.
arXiv Detail & Related papers (2024-03-24T17:33:22Z) - Challenging Forgets: Unveiling the Worst-Case Forget Sets in Machine Unlearning [9.998859702421417]
Machine unlearning (MU) aims to eliminate the influence of chosen data points on model performance.
Despite various MU methods for data influence erasure, evaluations have largely focused on random data forgetting.
We propose identifying the data subset that presents the most significant challenge for influence erasure, pinpointing the worst-case forget set.
arXiv Detail & Related papers (2024-03-12T06:50:32Z) - Enhancing Multiple Reliability Measures via Nuisance-extended
Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition.
We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training.
We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z) - Rethinking Missing Data: Aleatoric Uncertainty-Aware Recommendation [59.500347564280204]
We propose a new Aleatoric Uncertainty-aware Recommendation (AUR) framework.
AUR consists of a new uncertainty estimator along with a normal recommender model.
As the chance of mislabeling reflects the potential of a pair, AUR makes recommendations according to the uncertainty.
arXiv Detail & Related papers (2022-09-22T04:32:51Z) - ELODI: Ensemble Logit Difference Inhibition for Positive-Congruent Training [110.52785254565518]
Existing methods to reduce the negative flip rate (NFR) either do so at the expense of overall accuracy by forcing a new model to imitate the old models, or use ensembles.
We analyze the role of ensembles in reducing NFR and observe that they remove negative flips that are typically not close to the decision boundary.
We present a method, called Ensemble Logit Difference Inhibition (ELODI), to train a classification system that achieves paragon performance in both error rate and NFR.
arXiv Detail & Related papers (2022-05-12T17:59:56Z) - Machine Unlearning of Features and Labels [72.81914952849334]
We propose first scenarios for unlearning and labels in machine learning models.
Our approach builds on the concept of influence functions and realizes unlearning through closed-form updates of model parameters.
arXiv Detail & Related papers (2021-08-26T04:42:24Z) - Positive-Congruent Training: Towards Regression-Free Model Updates [87.25247195148187]
In image classification, sample-wise inconsistencies appear as "negative flips"
A new model incorrectly predicts the output for a test sample that was correctly classified by the old (reference) model.
We propose a simple approach for PC training, Focal Distillation, which enforces congruence with the reference model.
arXiv Detail & Related papers (2020-11-18T09:00:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.