Exploiting Supervised Poison Vulnerability to Strengthen Self-Supervised Defense
- URL: http://arxiv.org/abs/2409.08509v1
- Date: Fri, 13 Sep 2024 03:12:58 GMT
- Title: Exploiting Supervised Poison Vulnerability to Strengthen Self-Supervised Defense
- Authors: Jeremy Styborski, Mingzhi Lyu, Yi Huang, Adams Kong,
- Abstract summary: Self-supervised learning (SSL) is regarded as a strong defense against poisoned data.
We study SSL across multiple poisons on the CIFAR-10 and ImageNet-100 datasets.
Our proposed defense, designated VESPR, surpasses the performance of six previous defenses.
- Score: 3.4460811765644457
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Availability poisons exploit supervised learning (SL) algorithms by introducing class-related shortcut features in images such that models trained on poisoned data are useless for real-world datasets. Self-supervised learning (SSL), which utilizes augmentations to learn instance discrimination, is regarded as a strong defense against poisoned data. However, by extending the study of SSL across multiple poisons on the CIFAR-10 and ImageNet-100 datasets, we demonstrate that it often performs poorly, far below that of training on clean data. Leveraging the vulnerability of SL to poison attacks, we introduce adversarial training (AT) on SL to obfuscate poison features and guide robust feature learning for SSL. Our proposed defense, designated VESPR (Vulnerability Exploitation of Supervised Poisoning for Robust SSL), surpasses the performance of six previous defenses across seven popular availability poisons. VESPR displays superior performance over all previous defenses, boosting the minimum and average ImageNet-100 test accuracies of poisoned models by 16% and 9%, respectively. Through analysis and ablation studies, we elucidate the mechanisms by which VESPR learns robust class features.
Related papers
- Potion: Towards Poison Unlearning [47.00450933765504]
Adversarial attacks by malicious actors on machine learning systems pose significant risks.
The challenge in resolving such an attack arises in practice when only a subset of the poisoned data can be identified.
Our work addresses two key challenges to advance the state of the art in poison unlearning.
arXiv Detail & Related papers (2024-06-13T14:35:11Z) - Have You Poisoned My Data? Defending Neural Networks against Data Poisoning [0.393259574660092]
We propose a novel approach to detect and filter poisoned datapoints in the transfer learning setting.
We show that effective poisons can be successfully differentiated from clean points in the characteristic vector space.
Our evaluation shows that our proposal outperforms existing approaches in defense rate and final trained model performance.
arXiv Detail & Related papers (2024-03-20T11:50:16Z) - Erasing Self-Supervised Learning Backdoor by Cluster Activation Masking [65.44477004525231]
Researchers have recently found that Self-Supervised Learning (SSL) is vulnerable to backdoor attacks.
In this paper, we propose to erase the SSL backdoor by cluster activation masking and propose a novel PoisonCAM method.
Our method achieves 96% accuracy for backdoor trigger detection compared to 3% of the state-of-the-art method on poisoned ImageNet-100.
arXiv Detail & Related papers (2023-12-13T08:01:15Z) - On Practical Aspects of Aggregation Defenses against Data Poisoning
Attacks [58.718697580177356]
Attacks on deep learning models with malicious training samples are known as data poisoning.
Recent advances in defense strategies against data poisoning have highlighted the effectiveness of aggregation schemes in achieving certified poisoning robustness.
Here we focus on Deep Partition Aggregation, a representative aggregation defense, and assess its practical aspects, including efficiency, performance, and robustness.
arXiv Detail & Related papers (2023-06-28T17:59:35Z) - Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models [53.416234157608]
We investigate security concerns of the emergent instruction tuning paradigm, that models are trained on crowdsourced datasets with task instructions to achieve superior performance.
Our studies demonstrate that an attacker can inject backdoors by issuing very few malicious instructions and control model behavior through data poisoning.
arXiv Detail & Related papers (2023-05-24T04:27:21Z) - An Embarrassingly Simple Backdoor Attack on Self-supervised Learning [52.28670953101126]
Self-supervised learning (SSL) is capable of learning high-quality representations of complex data without relying on labels.
We study the inherent vulnerability of SSL to backdoor attacks.
arXiv Detail & Related papers (2022-10-13T20:39:21Z) - SparseFed: Mitigating Model Poisoning Attacks in Federated Learning with
Sparsification [24.053704318868043]
In model poisoning attacks, the attacker reduces the model's performance on targeted sub-tasks by uploading "poisoned" updates.
We introduce algoname, a novel defense that uses global top-k update sparsification and device-level clipping gradient to mitigate model poisoning attacks.
arXiv Detail & Related papers (2021-12-12T16:34:52Z) - How Robust are Randomized Smoothing based Defenses to Data Poisoning? [66.80663779176979]
We present a previously unrecognized threat to robust machine learning models that highlights the importance of training-data quality.
We propose a novel bilevel optimization-based data poisoning attack that degrades the robustness guarantees of certifiably robust classifiers.
Our attack is effective even when the victim trains the models from scratch using state-of-the-art robust training methods.
arXiv Detail & Related papers (2020-12-02T15:30:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.