Potion: Towards Poison Unlearning
- URL: http://arxiv.org/abs/2406.09173v3
- Date: Wed, 11 Sep 2024 13:27:09 GMT
- Title: Potion: Towards Poison Unlearning
- Authors: Stefan Schoepf, Jack Foster, Alexandra Brintrup,
- Abstract summary: Adversarial attacks by malicious actors on machine learning systems pose significant risks.
The challenge in resolving such an attack arises in practice when only a subset of the poisoned data can be identified.
Our work addresses two key challenges to advance the state of the art in poison unlearning.
- Score: 47.00450933765504
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Adversarial attacks by malicious actors on machine learning systems, such as introducing poison triggers into training datasets, pose significant risks. The challenge in resolving such an attack arises in practice when only a subset of the poisoned data can be identified. This necessitates the development of methods to remove, i.e. unlearn, poison triggers from already trained models with only a subset of the poison data available. The requirements for this task significantly deviate from privacy-focused unlearning where all of the data to be forgotten by the model is known. Previous work has shown that the undiscovered poisoned samples lead to a failure of established unlearning methods, with only one method, Selective Synaptic Dampening (SSD), showing limited success. Even full retraining, after the removal of the identified poison, cannot address this challenge as the undiscovered poison samples lead to a reintroduction of the poison trigger in the model. Our work addresses two key challenges to advance the state of the art in poison unlearning. First, we introduce a novel outlier-resistant method, based on SSD, that significantly improves model protection and unlearning performance. Second, we introduce Poison Trigger Neutralisation (PTN) search, a fast, parallelisable, hyperparameter search that utilises the characteristic "unlearning versus model protection" trade-off to find suitable hyperparameters in settings where the forget set size is unknown and the retain set is contaminated. We benchmark our contributions using ResNet-9 on CIFAR10 and WideResNet-28x10 on CIFAR100. Experimental results show that our method heals 93.72% of poison compared to SSD with 83.41% and full retraining with 40.68%. We achieve this while also lowering the average model accuracy drop caused by unlearning from 5.68% (SSD) to 1.41% (ours).
Related papers
- Clean Label Attacks against SLU Systems [33.69363383366988]
We adapted clean label backdoor (CLBD)-data poisoning attacks, which do not modify the training labels, on state-of-the-art speech recognition models.
We analyzed how varying the signal-strength of the poison, percent of samples poisoned, and choice of trigger impact the attack.
We found that CLBD attacks are most successful when applied to training samples that are inherently hard for a proxy model.
arXiv Detail & Related papers (2024-09-13T16:58:06Z) - Exploiting Supervised Poison Vulnerability to Strengthen Self-Supervised Defense [3.4460811765644457]
Self-supervised learning (SSL) is regarded as a strong defense against poisoned data.
We study SSL across multiple poisons on the CIFAR-10 and ImageNet-100 datasets.
Our proposed defense, designated VESPR, surpasses the performance of six previous defenses.
arXiv Detail & Related papers (2024-09-13T03:12:58Z) - SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks [53.28390057407576]
Modern NLP models are often trained on public datasets drawn from diverse sources.
Data poisoning attacks can manipulate the model's behavior in ways engineered by the attacker.
Several strategies have been proposed to mitigate the risks associated with backdoor attacks.
arXiv Detail & Related papers (2024-05-19T14:50:09Z) - Progressive Poisoned Data Isolation for Training-time Backdoor Defense [23.955347169187917]
Deep Neural Networks (DNN) are susceptible to backdoor attacks where malicious attackers manipulate the model's predictions via data poisoning.
In this study, we present a novel and efficacious defense method, termed Progressive Isolation of Poisoned Data (PIPD)
Our PIPD achieves an average True Positive Rate (TPR) of 99.95% and an average False Positive Rate (FPR) of 0.06% for diverse attacks over CIFAR-10 dataset.
arXiv Detail & Related papers (2023-12-20T02:40:28Z) - Exploring Model Dynamics for Accumulative Poisoning Discovery [62.08553134316483]
We propose a novel information measure, namely, Memorization Discrepancy, to explore the defense via the model-level information.
By implicitly transferring the changes in the data manipulation to that in the model outputs, Memorization Discrepancy can discover the imperceptible poison samples.
We thoroughly explore its properties and propose Discrepancy-aware Sample Correction (DSC) to defend against accumulative poisoning attacks.
arXiv Detail & Related papers (2023-06-06T14:45:24Z) - Sharpness-Aware Data Poisoning Attack [38.01535347191942]
Recent research has highlighted the vulnerability of Deep Neural Networks (DNNs) against data poisoning attacks.
We propose a novel attack method called ''Sharpness-Aware Data Poisoning Attack (SAPA)''
In particular, it leverages the concept of DNNs' loss landscape sharpness to optimize the poisoning effect on the worst re-trained model.
arXiv Detail & Related papers (2023-05-24T08:00:21Z) - Accumulative Poisoning Attacks on Real-time Data [56.96241557830253]
We show that a well-designed but straightforward attacking strategy can dramatically amplify the poisoning effects.
Our work validates that a well-designed but straightforward attacking strategy can dramatically amplify the poisoning effects.
arXiv Detail & Related papers (2021-06-18T08:29:53Z) - How Robust are Randomized Smoothing based Defenses to Data Poisoning? [66.80663779176979]
We present a previously unrecognized threat to robust machine learning models that highlights the importance of training-data quality.
We propose a novel bilevel optimization-based data poisoning attack that degrades the robustness guarantees of certifiably robust classifiers.
Our attack is effective even when the victim trains the models from scratch using state-of-the-art robust training methods.
arXiv Detail & Related papers (2020-12-02T15:30:21Z) - Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching [56.280018325419896]
Data Poisoning attacks modify training data to maliciously control a model trained on such data.
We analyze a particularly malicious poisoning attack that is both "from scratch" and "clean label"
We show that it is the first poisoning method to cause targeted misclassification in modern deep networks trained from scratch on a full-sized, poisoned ImageNet dataset.
arXiv Detail & Related papers (2020-09-04T16:17:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.