Related papers: Machine Unlearning Fails to Remove Data Poisoning Attacks

Machine Unlearning Fails to Remove Data Poisoning Attacks

URL: http://arxiv.org/abs/2406.17216v1
Date: Tue, 25 Jun 2024 02:05:29 GMT
Title: Machine Unlearning Fails to Remove Data Poisoning Attacks
Authors: Martin Pawelczyk, Jimmy Z. Di, Yiwei Lu, Gautam Kamath, Ayush Sekhari, Seth Neel,
Abstract summary: In addition to complying with data deletion requests, one often-cited potential application for unlearning methods is to remove the effects of training on poisoned data. We experimentally demonstrate that, while existing unlearning methods have been demonstrated to be effective in a number of evaluation settings, they fail to remove the effects of data poisoning.
Score: 20.495836283745618
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We revisit the efficacy of several practical methods for approximate machine unlearning developed for large-scale deep learning. In addition to complying with data deletion requests, one often-cited potential application for unlearning methods is to remove the effects of training on poisoned data. We experimentally demonstrate that, while existing unlearning methods have been demonstrated to be effective in a number of evaluation settings (e.g., alleviating membership inference attacks), they fail to remove the effects of data poisoning, across a variety of types of poisoning attacks (indiscriminate, targeted, and a newly-introduced Gaussian poisoning attack) and models (image classifiers and LLMs); even when granted a relatively large compute budget. In order to precisely characterize unlearning efficacy, we introduce new evaluation metrics for unlearning based on data poisoning. Our results suggest that a broader perspective, including a wider variety of evaluations, is required to avoid a false sense of confidence in machine unlearning procedures for deep learning without provable guarantees. Moreover, while unlearning methods show some signs of being useful to efficiently remove poisoned datapoints without having to retrain, our work suggests that these methods are not yet "ready for prime time", and currently provide limited benefit over retraining.

Related papers

RESTOR: Knowledge Recovery through Machine Unlearning [71.75834077528305]
Large language models trained on web-scale corpora can memorize undesirable datapoints. Many machine unlearning methods have been proposed that aim to 'erase' these datapoints from trained models. We propose the RESTOR framework for machine unlearning based on the following dimensions.
arXiv Detail & Related papers (2024-10-31T20:54:35Z)
Unlearnable Examples Detection via Iterative Filtering [84.59070204221366]
Deep neural networks are proven to be vulnerable to data poisoning attacks. It is quite beneficial and challenging to detect poisoned samples from a mixed dataset. We propose an Iterative Filtering approach for UEs identification.
arXiv Detail & Related papers (2024-08-15T13:26:13Z)
Unlearning with Control: Assessing Real-world Utility for Large Language Model Unlearning [97.2995389188179]
Recent research has begun to approach large language models (LLMs) unlearning via gradient ascent (GA) Despite their simplicity and efficiency, we suggest that GA-based methods face the propensity towards excessive unlearning. We propose several controlling methods that can regulate the extent of excessive unlearning.
arXiv Detail & Related papers (2024-06-13T14:41:00Z)
Gone but Not Forgotten: Improved Benchmarks for Machine Unlearning [0.0]
We describe and propose alternative evaluation methods for machine unlearning algorithms. We show the utility of our alternative evaluations via a series of experiments of state-of-the-art unlearning algorithms on different computer vision datasets.
arXiv Detail & Related papers (2024-05-29T15:53:23Z)
Learn What You Want to Unlearn: Unlearning Inversion Attacks against Machine Unlearning [16.809644622465086]
We conduct the first investigation to understand the extent to which machine unlearning can leak the confidential content of unlearned data. Under the Machine Learning as a Service setting, we propose unlearning inversion attacks that can reveal the feature and label information of an unlearned sample. The experimental results indicate that the proposed attack can reveal the sensitive information of the unlearned data.
arXiv Detail & Related papers (2024-04-04T06:37:46Z)
Efficient Knowledge Deletion from Trained Models through Layer-wise Partial Machine Unlearning [2.3496568239538083]
This paper introduces a novel class of machine unlearning algorithms. First method is partial amnesiac unlearning, integration of layer-wise pruning with amnesiac unlearning. Second method assimilates layer-wise partial-updates into label-flipping and optimization-based unlearning.
arXiv Detail & Related papers (2024-03-12T12:49:47Z)
Corrective Machine Unlearning [22.342035149807923]
We formalize Corrective Machine Unlearning as the problem of mitigating the impact of data affected by unknown manipulations on a trained model. We find most existing unlearning methods, including retraining-from-scratch without the deletion set, require most of the manipulated data to be identified for effective corrective unlearning. One approach, Selective Synaptic Dampening, achieves limited success, unlearning adverse effects with just a small portion of the manipulated samples in our setting.
arXiv Detail & Related papers (2024-02-21T18:54:37Z)
Transferable Availability Poisoning Attacks [23.241524904589326]
We consider availability data poisoning attacks, where an adversary aims to degrade the overall test accuracy of a machine learning model. Existing poisoning strategies can achieve the attack goal but assume the victim to employ the same learning method as what the adversary uses to mount the attack. We propose Transferable Poisoning, which first leverages the intrinsic characteristics of alignment and uniformity to enable better unlearnability.
arXiv Detail & Related papers (2023-10-08T12:22:50Z)
On Practical Aspects of Aggregation Defenses against Data Poisoning Attacks [58.718697580177356]
Attacks on deep learning models with malicious training samples are known as data poisoning. Recent advances in defense strategies against data poisoning have highlighted the effectiveness of aggregation schemes in achieving certified poisoning robustness. Here we focus on Deep Partition Aggregation, a representative aggregation defense, and assess its practical aspects, including efficiency, performance, and robustness.
arXiv Detail & Related papers (2023-06-28T17:59:35Z)
Learning to Unlearn: Instance-wise Unlearning for Pre-trained Classifiers [71.70205894168039]
We consider instance-wise unlearning, of which the goal is to delete information on a set of instances from a pre-trained model. We propose two methods that reduce forgetting on the remaining data: 1) utilizing adversarial examples to overcome forgetting at the representation-level and 2) leveraging weight importance metrics to pinpoint network parameters guilty of propagating unwanted information.
arXiv Detail & Related papers (2023-01-27T07:53:50Z)
Machine Unlearning of Features and Labels [72.81914952849334]
We propose first scenarios for unlearning and labels in machine learning models. Our approach builds on the concept of influence functions and realizes unlearning through closed-form updates of model parameters.
arXiv Detail & Related papers (2021-08-26T04:42:24Z)
Accumulative Poisoning Attacks on Real-time Data [56.96241557830253]
We show that a well-designed but straightforward attacking strategy can dramatically amplify the poisoning effects. Our work validates that a well-designed but straightforward attacking strategy can dramatically amplify the poisoning effects.
arXiv Detail & Related papers (2021-06-18T08:29:53Z)
Active Learning Under Malicious Mislabeling and Poisoning Attacks [2.4660652494309936]
Deep neural networks usually require large labeled datasets for training. Most of these data are unlabeled and are vulnerable to data poisoning attacks. In this paper, we develop an efficient active learning method that requires fewer labeled instances.
arXiv Detail & Related papers (2021-01-01T03:43:36Z)
Overcoming the curse of dimensionality with Laplacian regularization in semi-supervised learning [80.20302993614594]
We provide a statistical analysis to overcome drawbacks of Laplacian regularization. We unveil a large body of spectral filtering methods that exhibit desirable behaviors. We provide realistic computational guidelines in order to make our method usable with large amounts of data.
arXiv Detail & Related papers (2020-09-09T14:28:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.