COLLIDER: A Robust Training Framework for Backdoor Data
- URL: http://arxiv.org/abs/2210.06704v1
- Date: Thu, 13 Oct 2022 03:48:46 GMT
- Title: COLLIDER: A Robust Training Framework for Backdoor Data
- Authors: Hadi M. Dolatabadi, Sarah Erfani, Christopher Leckie
- Abstract summary: Deep neural network (DNN) classifiers are vulnerable to backdoor attacks.
An adversary poisons some of the training data in such attacks by installing a trigger.
Various approaches have recently been proposed to detect malicious backdoored DNNs.
- Score: 11.510009152620666
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural network (DNN) classifiers are vulnerable to backdoor attacks. An
adversary poisons some of the training data in such attacks by installing a
trigger. The goal is to make the trained DNN output the attacker's desired
class whenever the trigger is activated while performing as usual for clean
data. Various approaches have recently been proposed to detect malicious
backdoored DNNs. However, a robust, end-to-end training approach, like
adversarial training, is yet to be discovered for backdoor poisoned data. In
this paper, we take the first step toward such methods by developing a robust
training framework, COLLIDER, that selects the most prominent samples by
exploiting the underlying geometric structures of the data. Specifically, we
effectively filter out candidate poisoned data at each training epoch by
solving a geometrical coreset selection objective. We first argue how clean
data samples exhibit (1) gradients similar to the clean majority of data and
(2) low local intrinsic dimensionality (LID). Based on these criteria, we
define a novel coreset selection objective to find such samples, which are used
for training a DNN. We show the effectiveness of the proposed method for robust
training of DNNs on various poisoned datasets, reducing the backdoor success
rate significantly.
Related papers
- Backdoor Defense through Self-Supervised and Generative Learning [0.0]
Training on such data injects a backdoor which causes malicious inference in selected test samples.
This paper explores an approach based on generative modelling of per-class distributions in a self-supervised representation space.
In both cases, we find that per-class generative models allow to detect poisoned data and cleanse the dataset.
arXiv Detail & Related papers (2024-09-02T11:40:01Z) - Have You Poisoned My Data? Defending Neural Networks against Data Poisoning [0.393259574660092]
We propose a novel approach to detect and filter poisoned datapoints in the transfer learning setting.
We show that effective poisons can be successfully differentiated from clean points in the characteristic vector space.
Our evaluation shows that our proposal outperforms existing approaches in defense rate and final trained model performance.
arXiv Detail & Related papers (2024-03-20T11:50:16Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - Backdoor Defense via Adaptively Splitting Poisoned Dataset [57.70673801469096]
Backdoor defenses have been studied to alleviate the threat of deep neural networks (DNNs) being backdoor attacked and maliciously altered.
We argue that the core of training-time defense is to select poisoned samples and to handle them properly.
Under our framework, we propose an adaptively splitting dataset-based defense (ASD)
arXiv Detail & Related papers (2023-03-23T02:16:38Z) - Defending Against Backdoor Attacks by Layer-wise Feature Analysis [11.465401472704732]
Training deep neural networks (DNNs) usually requires massive training data and computational resources.
A new training-time attack (i.e., backdoor attack) aims to induce misclassification of input samples containing adversary-specified trigger patterns.
We propose a simple yet effective method to filter poisoned samples by analyzing the feature differences between suspicious and benign samples at the critical layer.
arXiv Detail & Related papers (2023-02-24T17:16:37Z) - Invisible Backdoor Attacks Using Data Poisoning in the Frequency Domain [8.64369418938889]
We propose a generalized backdoor attack method based on the frequency domain.
It can implement backdoor implantation without mislabeling and accessing the training process.
We evaluate our approach in the no-label and clean-label cases on three datasets.
arXiv Detail & Related papers (2022-07-09T07:05:53Z) - Backdoor Defense via Decoupling the Training Process [46.34744086706348]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
We propose a novel backdoor defense via decoupling the original end-to-end training process into three stages.
arXiv Detail & Related papers (2022-02-05T03:34:01Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z) - Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data.
We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level.
Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z) - How Robust are Randomized Smoothing based Defenses to Data Poisoning? [66.80663779176979]
We present a previously unrecognized threat to robust machine learning models that highlights the importance of training-data quality.
We propose a novel bilevel optimization-based data poisoning attack that degrades the robustness guarantees of certifiably robust classifiers.
Our attack is effective even when the victim trains the models from scratch using state-of-the-art robust training methods.
arXiv Detail & Related papers (2020-12-02T15:30:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.