Towards A Proactive ML Approach for Detecting Backdoor Poison Samples
- URL: http://arxiv.org/abs/2205.13616v3
- Date: Sun, 18 Jun 2023 02:11:20 GMT
- Title: Towards A Proactive ML Approach for Detecting Backdoor Poison Samples
- Authors: Xiangyu Qi, Tinghao Xie, Jiachen T. Wang, Tong Wu, Saeed Mahloujifar,
Prateek Mittal
- Abstract summary: Adversaries can embed backdoors in deep learning models by introducing backdoor poison samples into training datasets.
In this work, we investigate how to detect such poison samples to mitigate the threat of backdoor attacks.
- Score: 38.21287048132065
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversaries can embed backdoors in deep learning models by introducing
backdoor poison samples into training datasets. In this work, we investigate
how to detect such poison samples to mitigate the threat of backdoor attacks.
First, we uncover a post-hoc workflow underlying most prior work, where
defenders passively allow the attack to proceed and then leverage the
characteristics of the post-attacked model to uncover poison samples. We reveal
that this workflow does not fully exploit defenders' capabilities, and defense
pipelines built on it are prone to failure or performance degradation in many
scenarios. Second, we suggest a paradigm shift by promoting a proactive mindset
in which defenders engage proactively with the entire model training and poison
detection pipeline, directly enforcing and magnifying distinctive
characteristics of the post-attacked model to facilitate poison detection.
Based on this, we formulate a unified framework and provide practical insights
on designing detection pipelines that are more robust and generalizable. Third,
we introduce the technique of Confusion Training (CT) as a concrete
instantiation of our framework. CT applies an additional poisoning attack to
the already poisoned dataset, actively decoupling benign correlation while
exposing backdoor patterns to detection. Empirical evaluations on 4 datasets
and 14 types of attacks validate the superiority of CT over 14 baseline
defenses.
Related papers
- SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks [53.28390057407576]
Modern NLP models are often trained on public datasets drawn from diverse sources.
Data poisoning attacks can manipulate the model's behavior in ways engineered by the attacker.
Several strategies have been proposed to mitigate the risks associated with backdoor attacks.
arXiv Detail & Related papers (2024-05-19T14:50:09Z) - The Victim and The Beneficiary: Exploiting a Poisoned Model to Train a Clean Model on Poisoned Data [4.9676716806872125]
backdoor attacks have posed a serious security threat to the training process of deep neural networks (DNNs)
We propose a novel dual-network training framework: The Victim and The Beneficiary (V&B), which exploits a poisoned model to train a clean model without extra benign samples.
Our framework is effective in preventing backdoor injection and robust to various attacks while maintaining the performance on benign samples.
arXiv Detail & Related papers (2024-04-17T11:15:58Z) - Poisoned Forgery Face: Towards Backdoor Attacks on Face Forgery
Detection [62.595450266262645]
This paper introduces a novel and previously unrecognized threat in face forgery detection scenarios caused by backdoor attack.
By embedding backdoors into models, attackers can deceive detectors into producing erroneous predictions for forged faces.
We propose emphPoisoned Forgery Face framework, which enables clean-label backdoor attacks on face forgery detectors.
arXiv Detail & Related papers (2024-02-18T06:31:05Z) - BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive
Learning [85.2564206440109]
This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses.
We introduce the emphtoolns attack, which is resistant to backdoor detection and model fine-tuning defenses.
arXiv Detail & Related papers (2023-11-20T02:21:49Z) - Backdoor Defense via Deconfounded Representation Learning [17.28760299048368]
We propose a Causality-inspired Backdoor Defense (CBD) to learn deconfounded representations for reliable classification.
CBD is effective in reducing backdoor threats while maintaining high accuracy in predicting benign samples.
arXiv Detail & Related papers (2023-03-13T02:25:59Z) - Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics.
We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z) - Invisible Backdoor Attacks Using Data Poisoning in the Frequency Domain [8.64369418938889]
We propose a generalized backdoor attack method based on the frequency domain.
It can implement backdoor implantation without mislabeling and accessing the training process.
We evaluate our approach in the no-label and clean-label cases on three datasets.
arXiv Detail & Related papers (2022-07-09T07:05:53Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.