Training set cleansing of backdoor poisoning by self-supervised
representation learning
- URL: http://arxiv.org/abs/2210.10272v1
- Date: Wed, 19 Oct 2022 03:29:58 GMT
- Title: Training set cleansing of backdoor poisoning by self-supervised
representation learning
- Authors: H. Wang, S. Karami, O. Dia, H. Ritter, E. Emamjomeh-Zadeh, J. Chen, Z.
Xiang, D.J. Miller, G. Kesidis
- Abstract summary: A backdoor or Trojan attack is an important type of data poisoning attack against deep neural network (DNN)
We show that supervised training may build stronger association between the backdoor pattern and the associated target class than that between normal features and the true class of origin.
We propose to use unsupervised representation learning to avoid emphasising backdoor-poisoned training samples and learn a similar feature embedding for samples of the same class.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A backdoor or Trojan attack is an important type of data poisoning attack
against deep neural network (DNN) classifiers, wherein the training dataset is
poisoned with a small number of samples that each possess the backdoor pattern
(usually a pattern that is either imperceptible or innocuous) and which are
mislabeled to the attacker's target class. When trained on a backdoor-poisoned
dataset, a DNN behaves normally on most benign test samples but makes incorrect
predictions to the target class when the test sample has the backdoor pattern
incorporated (i.e., contains a backdoor trigger). Here we focus on image
classification tasks and show that supervised training may build stronger
association between the backdoor pattern and the associated target class than
that between normal features and the true class of origin. By contrast,
self-supervised representation learning ignores the labels of samples and
learns a feature embedding based on images' semantic content. %We thus propose
to use unsupervised representation learning to avoid emphasising
backdoor-poisoned training samples and learn a similar feature embedding for
samples of the same class. Using a feature embedding found by self-supervised
representation learning, a data cleansing method, which combines sample
filtering and re-labeling, is developed. Experiments on CIFAR-10 benchmark
datasets show that our method achieves state-of-the-art performance in
mitigating backdoor attacks.
Related papers
- Backdoor Defense through Self-Supervised and Generative Learning [0.0]
Training on such data injects a backdoor which causes malicious inference in selected test samples.
This paper explores an approach based on generative modelling of per-class distributions in a self-supervised representation space.
In both cases, we find that per-class generative models allow to detect poisoned data and cleanse the dataset.
arXiv Detail & Related papers (2024-09-02T11:40:01Z) - XGBD: Explanation-Guided Graph Backdoor Detection [21.918945251903523]
Backdoor attacks pose a significant security risk to graph learning models.
We propose an explanation-guided backdoor detection method to take advantage of the topological information.
arXiv Detail & Related papers (2023-08-08T17:10:23Z) - Shared Adversarial Unlearning: Backdoor Mitigation by Unlearning Shared
Adversarial Examples [67.66153875643964]
Backdoor attacks are serious security threats to machine learning models.
In this paper, we explore the task of purifying a backdoored model using a small clean dataset.
By establishing the connection between backdoor risk and adversarial risk, we derive a novel upper bound for backdoor risk.
arXiv Detail & Related papers (2023-07-20T03:56:04Z) - Backdoor Defense via Deconfounded Representation Learning [17.28760299048368]
We propose a Causality-inspired Backdoor Defense (CBD) to learn deconfounded representations for reliable classification.
CBD is effective in reducing backdoor threats while maintaining high accuracy in predicting benign samples.
arXiv Detail & Related papers (2023-03-13T02:25:59Z) - CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive
Learning [63.72975421109622]
CleanCLIP is a finetuning framework that weakens the learned spurious associations introduced by backdoor attacks.
CleanCLIP maintains model performance on benign examples while erasing a range of backdoor attacks on multimodal contrastive learning.
arXiv Detail & Related papers (2023-03-06T17:48:32Z) - Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics.
We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z) - Invisible Backdoor Attacks Using Data Poisoning in the Frequency Domain [8.64369418938889]
We propose a generalized backdoor attack method based on the frequency domain.
It can implement backdoor implantation without mislabeling and accessing the training process.
We evaluate our approach in the no-label and clean-label cases on three datasets.
arXiv Detail & Related papers (2022-07-09T07:05:53Z) - Backdoor Defense via Decoupling the Training Process [46.34744086706348]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
We propose a novel backdoor defense via decoupling the original end-to-end training process into three stages.
arXiv Detail & Related papers (2022-02-05T03:34:01Z) - Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data.
We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level.
Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z) - Scalable Backdoor Detection in Neural Networks [61.39635364047679]
Deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch.
We propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types.
In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.
arXiv Detail & Related papers (2020-06-10T04:12:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.