Black-box Detection of Backdoor Attacks with Limited Information and
Data
- URL: http://arxiv.org/abs/2103.13127v1
- Date: Wed, 24 Mar 2021 12:06:40 GMT
- Title: Black-box Detection of Backdoor Attacks with Limited Information and
Data
- Authors: Yinpeng Dong, Xiao Yang, Zhijie Deng, Tianyu Pang, Zihao Xiao, Hang
Su, Jun Zhu
- Abstract summary: We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
- Score: 56.0735480850555
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Although deep neural networks (DNNs) have made rapid progress in recent
years, they are vulnerable in adversarial environments. A malicious backdoor
could be embedded in a model by poisoning the training dataset, whose intention
is to make the infected model give wrong predictions during inference when the
specific trigger appears. To mitigate the potential threats of backdoor
attacks, various backdoor detection and defense methods have been proposed.
However, the existing techniques usually require the poisoned training data or
access to the white-box model, which is commonly unavailable in practice. In
this paper, we propose a black-box backdoor detection (B3D) method to identify
backdoor attacks with only query access to the model. We introduce a
gradient-free optimization algorithm to reverse-engineer the potential trigger
for each class, which helps to reveal the existence of backdoor attacks. In
addition to backdoor detection, we also propose a simple strategy for reliable
predictions using the identified backdoored models. Extensive experiments on
hundreds of DNN models trained on several datasets corroborate the
effectiveness of our method under the black-box setting against various
backdoor attacks.
Related papers
- Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor [63.84477483795964]
Data-poisoning backdoor attacks are serious security threats to machine learning models.
In this paper, we focus on in-training backdoor defense, aiming to train a clean model even when the dataset may be potentially poisoned.
We propose a novel defense approach called PDB (Proactive Defensive Backdoor)
arXiv Detail & Related papers (2024-05-25T07:52:26Z) - Backdoor Defense via Deconfounded Representation Learning [17.28760299048368]
We propose a Causality-inspired Backdoor Defense (CBD) to learn deconfounded representations for reliable classification.
CBD is effective in reducing backdoor threats while maintaining high accuracy in predicting benign samples.
arXiv Detail & Related papers (2023-03-13T02:25:59Z) - Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics.
We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z) - BATT: Backdoor Attack with Transformation-based Triggers [72.61840273364311]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
Backdoor adversaries inject hidden backdoors that can be activated by adversary-specified trigger patterns.
One recent research revealed that most of the existing attacks failed in the real physical world.
arXiv Detail & Related papers (2022-11-02T16:03:43Z) - Invisible Backdoor Attacks Using Data Poisoning in the Frequency Domain [8.64369418938889]
We propose a generalized backdoor attack method based on the frequency domain.
It can implement backdoor implantation without mislabeling and accessing the training process.
We evaluate our approach in the no-label and clean-label cases on three datasets.
arXiv Detail & Related papers (2022-07-09T07:05:53Z) - On the Effectiveness of Adversarial Training against Backdoor Attacks [111.8963365326168]
A backdoored model always predicts a target class in the presence of a predefined trigger pattern.
In general, adversarial training is believed to defend against backdoor attacks.
We propose a hybrid strategy which provides satisfactory robustness across different backdoor attacks.
arXiv Detail & Related papers (2022-02-22T02:24:46Z) - AEVA: Black-box Backdoor Detection Using Adversarial Extreme Value
Analysis [23.184335982913325]
We address the black-box hard-label backdoor detection problem.
We show that the objective of backdoor detection is bounded by an adversarial objective.
We propose the adversarial extreme value analysis to detect backdoors in black-box neural networks.
arXiv Detail & Related papers (2021-10-28T04:36:48Z) - Light Can Hack Your Face! Black-box Backdoor Attack on Face Recognition
Systems [0.0]
We propose a novel black-box backdoor attack technique on face recognition systems.
We show that the backdoor trigger can be quite effective, where the attack success rate can be up to $88%$.
We highlight that our study revealed a new physical backdoor attack, which calls for the attention of the security issue of the existing face recognition/verification techniques.
arXiv Detail & Related papers (2020-09-15T11:50:29Z) - Backdoor Learning: A Survey [75.59571756777342]
Backdoor attack intends to embed hidden backdoor into deep neural networks (DNNs)
Backdoor learning is an emerging and rapidly growing research area.
This paper presents the first comprehensive survey of this realm.
arXiv Detail & Related papers (2020-07-17T04:09:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.