Related papers: Mask and Restore: Blind Backdoor Defense at Test Time with Masked Autoencoder

Mask and Restore: Blind Backdoor Defense at Test Time with Masked Autoencoder

URL: http://arxiv.org/abs/2303.15564v2
Date: Mon, 2 Oct 2023 15:33:54 GMT
Title: Mask and Restore: Blind Backdoor Defense at Test Time with Masked Autoencoder
Authors: Tao Sun, Lu Pang, Chao Chen, Haibin Ling
Abstract summary: We propose a framework for blind backdoor defense with Masked AutoEncoder (BDMAE) BDMAE detects possible triggers in the token space using image structural similarity and label consistency between the test image and MAE restorations. Our approach is blind to the model restorations, trigger patterns and image benignity.
Score: 57.739693628523
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep neural networks are vulnerable to backdoor attacks, where an adversary maliciously manipulates the model behavior through overlaying images with special triggers. Existing backdoor defense methods often require accessing a few validation data and model parameters, which are impractical in many real-world applications, e.g., when the model is provided as a cloud service. In this paper, we address the practical task of blind backdoor defense at test time, in particular for black-box models. The true label of every test image needs to be recovered on the fly from a suspicious model regardless of image benignity. We focus on test-time image purification methods that incapacitate possible triggers while keeping semantic contents intact. Due to diverse trigger patterns and sizes, the heuristic trigger search in image space can be unscalable. We circumvent such barrier by leveraging the strong reconstruction power of generative models, and propose a framework of Blind Defense with Masked AutoEncoder (BDMAE). It detects possible triggers in the token space using image structural similarity and label consistency between the test image and MAE restorations. The detection results are then refined by considering trigger topology. Finally, we fuse MAE restorations adaptively into a purified image for making prediction. Our approach is blind to the model architectures, trigger patterns and image benignity. Extensive experiments under different backdoor settings validate its effectiveness and generalizability. Code is available at https://github.com/tsun/BDMAE.

Related papers

Gungnir: Exploiting Stylistic Features in Images for Backdoor Attacks on Diffusion Models [8.672029086609884]
Diffusion Models (DMs) are vulnerable to backdoor attacks. Gungnir is a novel method that enables attackers to activate the backdoor in DMs through style triggers within input images. Our technique generates trigger-embedded images that are perceptually indistinguishable from clean images.
arXiv Detail & Related papers (2025-02-28T02:08:26Z)
BadScan: An Architectural Backdoor Attack on Visual State Space Models [2.2499166814992435]
Recently introduced Visual State Space Model (VMamba) has shown exceptional performance compared to Vision Transformers (ViT) One common approach is to embed a trigger in the training data to retrain the model, causing it to misclassify data samples into a target class. We introduce a novel architectural backdoor attack, termed BadScan, designed to deceive the VMamba model.
arXiv Detail & Related papers (2024-11-26T10:13:09Z)
Evolutionary Trigger Detection and Lightweight Model Repair Based Backdoor Defense [10.310546695762467]
Deep Neural Networks (DNNs) have been widely used in many areas such as autonomous driving and face recognition. A backdoor in the DNN model can be activated by a poisoned input with trigger and leads to wrong prediction. We propose an efficient backdoor defense based on evolutionary trigger detection and lightweight model repair.
arXiv Detail & Related papers (2024-07-07T14:50:59Z)
Stealthy Targeted Backdoor Attacks against Image Captioning [16.409633596670368]
We present a novel method to craft targeted backdoor attacks against image caption models. Our method first learns a special trigger by leveraging universal perturbation techniques for object detection. Our approach can achieve a high attack success rate while having a negligible impact on model clean performance.
arXiv Detail & Related papers (2024-06-09T18:11:06Z)
Backdoor Attack with Mode Mixture Latent Modification [26.720292228686446]
We propose a backdoor attack paradigm that only requires minimal alterations to a clean model in order to inject the backdoor under the guise of fine-tuning. We evaluate the effectiveness of our method on four popular benchmark datasets.
arXiv Detail & Related papers (2024-03-12T09:59:34Z)
Model Pairing Using Embedding Translation for Backdoor Attack Detection on Open-Set Classification Tasks [63.269788236474234]
We propose to use model pairs on open-set classification tasks for detecting backdoors. We show that this score, can be an indicator for the presence of a backdoor despite models being of different architectures. This technique allows for the detection of backdoors on models designed for open-set classification tasks, which is little studied in the literature.
arXiv Detail & Related papers (2024-02-28T21:29:16Z)
Distilling Cognitive Backdoor Patterns within an Image [35.1754797302114]
This paper proposes a simple method to distill and detect backdoor patterns within an image: emphCognitive Distillation (CD) The extracted pattern can help understand the cognitive mechanism of a model on clean vs. backdoor images. We conduct extensive experiments to show that CD can robustly detect a wide range of advanced backdoor attacks.
arXiv Detail & Related papers (2023-01-26T02:38:37Z)
Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics. We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z)
FIBA: Frequency-Injection based Backdoor Attack in Medical Image Analysis [82.2511780233828]
We propose a novel Frequency-Injection based Backdoor Attack method (FIBA) that is capable of delivering attacks in various medical image analysis tasks. Specifically, FIBA leverages a trigger function in the frequency domain that can inject the low-frequency information of a trigger image into the poisoned image by linearly combining the spectral amplitude of both images.
arXiv Detail & Related papers (2021-12-02T11:52:17Z)
Backdoor Attack on Hash-based Image Retrieval via Clean-label Data Poisoning [54.15013757920703]
We propose the confusing perturbations-induced backdoor attack (CIBA) It injects a small number of poisoned images with the correct label into the training data. We have conducted extensive experiments to verify the effectiveness of our proposed CIBA.
arXiv Detail & Related papers (2021-09-18T07:56:59Z)
Rethinking the Trigger of Backdoor Attack [83.98031510668619]
Currently, most of existing backdoor attacks adopted the setting of emphstatic trigger, $i.e.,$ triggers across the training and testing images follow the same appearance and are located in the same area. We demonstrate that such an attack paradigm is vulnerable when the trigger in testing images is not consistent with the one used for training.
arXiv Detail & Related papers (2020-04-09T17:19:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.