PUMA: Performance Unchanged Model Augmentation for Training Data Removal
- URL: http://arxiv.org/abs/2203.00846v1
- Date: Wed, 2 Mar 2022 03:40:17 GMT
- Title: PUMA: Performance Unchanged Model Augmentation for Training Data Removal
- Authors: Ga Wu, Masoud Hashemi, Christopher Srinivasa
- Abstract summary: This paper presents a novel approach called Performance Unchanged Model Augmentation(PUMA)
The proposed PUMA framework explicitly models the influence of each training data point on the model's generalization ability.
We show the PUMA can effectively and efficiently remove the unique characteristics of marked training data without retraining the model.
- Score: 2.8468089304148445
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Preserving the performance of a trained model while removing unique
characteristics of marked training data points is challenging. Recent research
usually suggests retraining a model from scratch with remaining training data
or refining the model by reverting the model optimization on the marked data
points. Unfortunately, aside from their computational inefficiency, those
approaches inevitably hurt the resulting model's generalization ability since
they remove not only unique characteristics but also discard shared (and
possibly contributive) information. To address the performance degradation
problem, this paper presents a novel approach called Performance Unchanged
Model Augmentation~(PUMA). The proposed PUMA framework explicitly models the
influence of each training data point on the model's generalization ability
with respect to various performance criteria. It then complements the negative
impact of removing marked data by reweighting the remaining data optimally. To
demonstrate the effectiveness of the PUMA framework, we compared it with
multiple state-of-the-art data removal techniques in the experiments, where we
show the PUMA can effectively and efficiently remove the unique characteristics
of marked training data without retraining the model that can 1) fool a
membership attack, and 2) resist performance degradation. In addition, as PUMA
estimates the data importance during its operation, we show it could serve to
debug mislabelled data points more efficiently than existing approaches.
Related papers
- Machine Unlearning on Pre-trained Models by Residual Feature Alignment Using LoRA [15.542668474378633]
We propose a novel and efficient machine unlearning method on pre-trained models.
We leverage LoRA to decompose the model's intermediate features into pre-trained features and residual features.
The method aims to learn the zero residuals on the retained set and shifted residuals on the unlearning set.
arXiv Detail & Related papers (2024-11-13T08:56:35Z) - Attribute-to-Delete: Machine Unlearning via Datamodel Matching [65.13151619119782]
Machine unlearning -- efficiently removing a small "forget set" training data on a pre-divertrained machine learning model -- has recently attracted interest.
Recent research shows that machine unlearning techniques do not hold up in such a challenging setting.
arXiv Detail & Related papers (2024-10-30T17:20:10Z) - Parameter Matching Attack: Enhancing Practical Applicability of Availability Attacks [8.225819874406238]
We propose a novel availability approach termed Matching Attack (PMA)
PMA is the first availability attack that works when only a portion of data can be perturbed.
We show that PMA outperforms existing methods, achieving significant model performance degradation when a part of the training data is perturbed.
arXiv Detail & Related papers (2024-07-02T17:15:12Z) - Data Shapley in One Training Run [88.59484417202454]
Data Shapley provides a principled framework for attributing data's contribution within machine learning contexts.
Existing approaches require re-training models on different data subsets, which is computationally intensive.
This paper introduces In-Run Data Shapley, which addresses these limitations by offering scalable data attribution for a target model of interest.
arXiv Detail & Related papers (2024-06-16T17:09:24Z) - PUMA: margin-based data pruning [51.12154122266251]
We focus on data pruning, where some training samples are removed based on the distance to the model classification boundary (i.e., margin)
We propose PUMA, a new data pruning strategy that computes the margin using DeepFool.
We show that PUMA can be used on top of the current state-of-the-art methodology in robustness, and it is able to significantly improve the model performance unlike the existing data pruning strategies.
arXiv Detail & Related papers (2024-05-10T08:02:20Z) - Distilled Datamodel with Reverse Gradient Matching [74.75248610868685]
We introduce an efficient framework for assessing data impact, comprising offline training and online evaluation stages.
Our proposed method achieves comparable model behavior evaluation while significantly speeding up the process compared to the direct retraining method.
arXiv Detail & Related papers (2024-04-22T09:16:14Z) - Estimating Model Performance Under Covariate Shift Without Labels [9.804680621164168]
We introduce Probabilistic Adaptive Performance Estimation (PAPE) for evaluating classification models on unlabeled data.
PAPE provides more accurate performance estimates than other evaluated methodologies.
arXiv Detail & Related papers (2024-01-16T13:29:30Z) - Secrets of RLHF in Large Language Models Part II: Reward Modeling [134.97964938009588]
We introduce a series of novel methods to mitigate the influence of incorrect and ambiguous preferences in the dataset.
We also introduce contrastive learning to enhance the ability of reward models to distinguish between chosen and rejected responses.
arXiv Detail & Related papers (2024-01-11T17:56:59Z) - Recommendation Unlearning via Influence Function [42.4931807753579]
We propose a new Influence Function-based Recommendation Unlearning (IFRU) framework, which efficiently updates the model without retraining.
IFRU achieves more than 250 times acceleration compared to retraining-based methods with recommendation performance comparable to full retraining.
arXiv Detail & Related papers (2023-07-05T09:42:51Z) - Maintaining Stability and Plasticity for Predictive Churn Reduction [8.971668467496055]
We propose a solution called Accumulated Model Combination (AMC)
AMC is a general technique and we propose several instances of it, each having their own advantages depending on the model and data properties.
arXiv Detail & Related papers (2023-05-06T20:56:20Z) - Exposing Shallow Heuristics of Relation Extraction Models with Challenge
Data [49.378860065474875]
We identify failure modes of SOTA relation extraction (RE) models trained on TACRED.
By adding some of the challenge data as training examples, the performance of the model improves.
arXiv Detail & Related papers (2020-10-07T21:17:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.