Mask and Restore: Blind Backdoor Defense at Test Time with Masked
Autoencoder
- URL: http://arxiv.org/abs/2303.15564v2
- Date: Mon, 2 Oct 2023 15:33:54 GMT
- Title: Mask and Restore: Blind Backdoor Defense at Test Time with Masked
Autoencoder
- Authors: Tao Sun, Lu Pang, Chao Chen, Haibin Ling
- Abstract summary: We propose a framework for blind backdoor defense with Masked AutoEncoder (BDMAE)
BDMAE detects possible triggers in the token space using image structural similarity and label consistency between the test image and MAE restorations.
Our approach is blind to the model restorations, trigger patterns and image benignity.
- Score: 57.739693628523
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks are vulnerable to backdoor attacks, where an adversary
maliciously manipulates the model behavior through overlaying images with
special triggers. Existing backdoor defense methods often require accessing a
few validation data and model parameters, which are impractical in many
real-world applications, e.g., when the model is provided as a cloud service.
In this paper, we address the practical task of blind backdoor defense at test
time, in particular for black-box models. The true label of every test image
needs to be recovered on the fly from a suspicious model regardless of image
benignity. We focus on test-time image purification methods that incapacitate
possible triggers while keeping semantic contents intact. Due to diverse
trigger patterns and sizes, the heuristic trigger search in image space can be
unscalable. We circumvent such barrier by leveraging the strong reconstruction
power of generative models, and propose a framework of Blind Defense with
Masked AutoEncoder (BDMAE). It detects possible triggers in the token space
using image structural similarity and label consistency between the test image
and MAE restorations. The detection results are then refined by considering
trigger topology. Finally, we fuse MAE restorations adaptively into a purified
image for making prediction. Our approach is blind to the model architectures,
trigger patterns and image benignity. Extensive experiments under different
backdoor settings validate its effectiveness and generalizability. Code is
available at https://github.com/tsun/BDMAE.
Related papers
- Diff-Cleanse: Identifying and Mitigating Backdoor Attacks in Diffusion Models [3.134071086568745]
Diffusion models (DMs) are regarded as one of the most advanced generative models today.
Recent studies suggest that DMs are vulnerable to backdoor attacks.
This vulnerability poses substantial risks, including reputational damage to model owners.
We introduce Diff-Cleanse, a novel two-stage backdoor defense framework specifically designed for DMs.
arXiv Detail & Related papers (2024-07-31T03:54:41Z) - Evolutionary Trigger Detection and Lightweight Model Repair Based Backdoor Defense [10.310546695762467]
Deep Neural Networks (DNNs) have been widely used in many areas such as autonomous driving and face recognition.
A backdoor in the DNN model can be activated by a poisoned input with trigger and leads to wrong prediction.
We propose an efficient backdoor defense based on evolutionary trigger detection and lightweight model repair.
arXiv Detail & Related papers (2024-07-07T14:50:59Z) - Stealthy Targeted Backdoor Attacks against Image Captioning [16.409633596670368]
We present a novel method to craft targeted backdoor attacks against image caption models.
Our method first learns a special trigger by leveraging universal perturbation techniques for object detection.
Our approach can achieve a high attack success rate while having a negligible impact on model clean performance.
arXiv Detail & Related papers (2024-06-09T18:11:06Z) - Backdoor Attack with Mode Mixture Latent Modification [26.720292228686446]
We propose a backdoor attack paradigm that only requires minimal alterations to a clean model in order to inject the backdoor under the guise of fine-tuning.
We evaluate the effectiveness of our method on four popular benchmark datasets.
arXiv Detail & Related papers (2024-03-12T09:59:34Z) - Model Pairing Using Embedding Translation for Backdoor Attack Detection on Open-Set Classification Tasks [63.269788236474234]
We propose to use model pairs on open-set classification tasks for detecting backdoors.
We show that this score, can be an indicator for the presence of a backdoor despite models being of different architectures.
This technique allows for the detection of backdoors on models designed for open-set classification tasks, which is little studied in the literature.
arXiv Detail & Related papers (2024-02-28T21:29:16Z) - Distilling Cognitive Backdoor Patterns within an Image [35.1754797302114]
This paper proposes a simple method to distill and detect backdoor patterns within an image: emphCognitive Distillation (CD)
The extracted pattern can help understand the cognitive mechanism of a model on clean vs. backdoor images.
We conduct extensive experiments to show that CD can robustly detect a wide range of advanced backdoor attacks.
arXiv Detail & Related papers (2023-01-26T02:38:37Z) - Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics.
We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z) - FIBA: Frequency-Injection based Backdoor Attack in Medical Image
Analysis [82.2511780233828]
We propose a novel Frequency-Injection based Backdoor Attack method (FIBA) that is capable of delivering attacks in various medical image analysis tasks.
Specifically, FIBA leverages a trigger function in the frequency domain that can inject the low-frequency information of a trigger image into the poisoned image by linearly combining the spectral amplitude of both images.
arXiv Detail & Related papers (2021-12-02T11:52:17Z) - Backdoor Attack on Hash-based Image Retrieval via Clean-label Data
Poisoning [54.15013757920703]
We propose the confusing perturbations-induced backdoor attack (CIBA)
It injects a small number of poisoned images with the correct label into the training data.
We have conducted extensive experiments to verify the effectiveness of our proposed CIBA.
arXiv Detail & Related papers (2021-09-18T07:56:59Z) - Rethinking the Trigger of Backdoor Attack [83.98031510668619]
Currently, most of existing backdoor attacks adopted the setting of emphstatic trigger, $i.e.,$ triggers across the training and testing images follow the same appearance and are located in the same area.
We demonstrate that such an attack paradigm is vulnerable when the trigger in testing images is not consistent with the one used for training.
arXiv Detail & Related papers (2020-04-09T17:19:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.