Single Image Backdoor Inversion via Robust Smoothed Classifiers
- URL: http://arxiv.org/abs/2303.00215v2
- Date: Sun, 17 Dec 2023 23:11:52 GMT
- Title: Single Image Backdoor Inversion via Robust Smoothed Classifiers
- Authors: Mingjie Sun, J. Zico Kolter
- Abstract summary: We present a new approach for backdoor inversion, which is able to recover the hidden backdoor with as few as a single image.
In this work, we present a new approach for backdoor inversion, which is able to recover the hidden backdoor with as few as a single image.
- Score: 76.66635991456336
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Backdoor inversion, a central step in many backdoor defenses, is a
reverse-engineering process to recover the hidden backdoor trigger inserted
into a machine learning model. Existing approaches tackle this problem by
searching for a backdoor pattern that is able to flip a set of clean images
into the target class, while the exact size needed of this support set is
rarely investigated. In this work, we present a new approach for backdoor
inversion, which is able to recover the hidden backdoor with as few as a single
image. Insipired by recent advances in adversarial robustness, our method
SmoothInv starts from a single clean image, and then performs projected
gradient descent towards the target class on a robust smoothed version of the
original backdoored classifier. We find that backdoor patterns emerge naturally
from such optimization process. Compared to existing backdoor inversion
methods, SmoothInv introduces minimum optimization variables and does not
require complex regularization schemes. We perform a comprehensive quantitative
and qualitative study on backdoored classifiers obtained from existing backdoor
attacks. We demonstrate that SmoothInv consistently recovers successful
backdoors from single images: for backdoored ImageNet classifiers, our
reconstructed backdoors have close to 100% attack success rates. We also show
that they maintain high fidelity to the underlying true backdoors. Last, we
propose and analyze two countermeasures to our approach and show that SmoothInv
remains robust in the face of an adaptive attacker. Our code is available at
https://github.com/locuslab/smoothinv.
Related papers
- Expose Before You Defend: Unifying and Enhancing Backdoor Defenses via Exposed Models [68.40324627475499]
We introduce a novel two-step defense framework named Expose Before You Defend.
EBYD unifies existing backdoor defense methods into a comprehensive defense system with enhanced performance.
We conduct extensive experiments on 10 image attacks and 6 text attacks across 2 vision datasets and 4 language datasets.
arXiv Detail & Related papers (2024-10-25T09:36:04Z) - Flatness-aware Sequential Learning Generates Resilient Backdoors [7.969181278996343]
Recently, backdoor attacks have become an emerging threat to the security of machine learning models.
This paper counters CF of backdoors by leveraging continual learning (CL) techniques.
We propose a novel framework, named Sequential Backdoor Learning (SBL), that can generate resilient backdoors.
arXiv Detail & Related papers (2024-07-20T03:30:05Z) - BAN: Detecting Backdoors Activated by Adversarial Neuron Noise [30.243702765232083]
Backdoor attacks on deep learning represent a recent threat that has gained significant attention in the research community.
Backdoor defenses are mainly based on backdoor inversion, which has been shown to be generic, model-agnostic, and applicable to practical threat scenarios.
This paper improves backdoor feature inversion for backdoor detection by incorporating extra neuron activation information.
arXiv Detail & Related papers (2024-05-30T10:44:45Z) - Backdoor Attack with Mode Mixture Latent Modification [26.720292228686446]
We propose a backdoor attack paradigm that only requires minimal alterations to a clean model in order to inject the backdoor under the guise of fine-tuning.
We evaluate the effectiveness of our method on four popular benchmark datasets.
arXiv Detail & Related papers (2024-03-12T09:59:34Z) - Physical Invisible Backdoor Based on Camera Imaging [32.30547033643063]
Current backdoor attacks require changing pixels of clean images.
This paper proposes a novel physical invisible backdoor based on camera imaging without changing nature image pixels.
arXiv Detail & Related papers (2023-09-14T04:58:06Z) - BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input
Detection [42.021282816470794]
We present a novel defense, against backdoor attacks on Deep Neural Networks (DNNs)
Our defense falls within the category of post-development defenses that operate independently of how the model was generated.
We show the feasibility of devising highly accurate backdoor input detectors that filter out the backdoor inputs during model inference.
arXiv Detail & Related papers (2023-08-23T21:47:06Z) - Trap and Replace: Defending Backdoor Attacks by Trapping Them into an
Easy-to-Replace Subnetwork [105.0735256031911]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
We propose a brand-new backdoor defense strategy, which makes it much easier to remove the harmful influence of backdoor samples.
We evaluate our method against ten different backdoor attacks.
arXiv Detail & Related papers (2022-10-12T17:24:01Z) - Check Your Other Door! Establishing Backdoor Attacks in the Frequency
Domain [80.24811082454367]
We show the advantages of utilizing the frequency domain for establishing undetectable and powerful backdoor attacks.
We also show two possible defences that succeed against frequency-based backdoor attacks and possible ways for the attacker to bypass them.
arXiv Detail & Related papers (2021-09-12T12:44:52Z) - Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word
Substitution [57.51117978504175]
Recent studies show that neural natural language processing (NLP) models are vulnerable to backdoor attacks.
Injected with backdoors, models perform normally on benign examples but produce attacker-specified predictions when the backdoor is activated.
We present invisible backdoors that are activated by a learnable combination of word substitution.
arXiv Detail & Related papers (2021-06-11T13:03:17Z) - Backdoor Learning: A Survey [75.59571756777342]
Backdoor attack intends to embed hidden backdoor into deep neural networks (DNNs)
Backdoor learning is an emerging and rapidly growing research area.
This paper presents the first comprehensive survey of this realm.
arXiv Detail & Related papers (2020-07-17T04:09:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.