UNICORN: A Unified Backdoor Trigger Inversion Framework
- URL: http://arxiv.org/abs/2304.02786v1
- Date: Wed, 5 Apr 2023 23:14:08 GMT
- Title: UNICORN: A Unified Backdoor Trigger Inversion Framework
- Authors: Zhenting Wang, Kai Mei, Juan Zhai, Shiqing Ma
- Abstract summary: Trigger inversion is an effective way of identifying backdoor models and understanding embedded adversarial behaviors.
This work formally defines and analyzes the triggers injected in different spaces and the inversion problem.
Then, it proposes a unified framework to invert backdoor triggers based on the formalization of triggers and the identified inner behaviors of backdoor models.
- Score: 13.841110859970827
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The backdoor attack, where the adversary uses inputs stamped with triggers
(e.g., a patch) to activate pre-planted malicious behaviors, is a severe threat
to Deep Neural Network (DNN) models. Trigger inversion is an effective way of
identifying backdoor models and understanding embedded adversarial behaviors. A
challenge of trigger inversion is that there are many ways of constructing the
trigger. Existing methods cannot generalize to various types of triggers by
making certain assumptions or attack-specific constraints. The fundamental
reason is that existing work does not consider the trigger's design space in
their formulation of the inversion problem. This work formally defines and
analyzes the triggers injected in different spaces and the inversion problem.
Then, it proposes a unified framework to invert backdoor triggers based on the
formalization of triggers and the identified inner behaviors of backdoor models
from our analysis. Our prototype UNICORN is general and effective in inverting
backdoor triggers in DNNs. The code can be found at
https://github.com/RU-System-Software-and-Security/UNICORN.
Related papers
- PureDiffusion: Using Backdoor to Counter Backdoor in Generative Diffusion Models [5.957580737396457]
Diffusion models (DMs) are advanced deep learning models that achieved state-of-the-art capability on a wide range of generative tasks.
Recent studies have shown their vulnerability regarding backdoor attacks, in which backdoored DMs consistently generate a designated result called backdoor target.
We introduce PureDiffusion, a novel backdoor defense framework that can efficiently detect backdoor attacks by inverting backdoor triggers embedded in DMs.
arXiv Detail & Related papers (2024-09-20T23:19:26Z) - Eliminating Backdoors in Neural Code Models via Trigger Inversion [24.053091055319562]
Backdoor attacks on neural code models pose a significant security threat.
In this paper, we propose a backdoor defense technique based on trigger inversion, called EliBadCode.
We show that EliBadCode can effectively eliminate backdoors while having minimal adverse effects on the normal functionality of the model.
arXiv Detail & Related papers (2024-08-08T08:23:03Z) - Evolutionary Trigger Detection and Lightweight Model Repair Based Backdoor Defense [10.310546695762467]
Deep Neural Networks (DNNs) have been widely used in many areas such as autonomous driving and face recognition.
A backdoor in the DNN model can be activated by a poisoned input with trigger and leads to wrong prediction.
We propose an efficient backdoor defense based on evolutionary trigger detection and lightweight model repair.
arXiv Detail & Related papers (2024-07-07T14:50:59Z) - LOTUS: Evasive and Resilient Backdoor Attacks through Sub-Partitioning [49.174341192722615]
Backdoor attack poses a significant security threat to Deep Learning applications.
Recent papers have introduced attacks using sample-specific invisible triggers crafted through special transformation functions.
We introduce a novel backdoor attack LOTUS to address both evasiveness and resilience.
arXiv Detail & Related papers (2024-03-25T21:01:29Z) - From Shortcuts to Triggers: Backdoor Defense with Denoised PoE [51.287157951953226]
Language models are often at risk of diverse backdoor attacks, especially data poisoning.
Existing backdoor defense methods mainly focus on backdoor attacks with explicit triggers.
We propose an end-to-end ensemble-based backdoor defense framework, DPoE, to defend various backdoor attacks.
arXiv Detail & Related papers (2023-05-24T08:59:25Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - Backdoor Attacks with Input-unique Triggers in NLP [34.98477726215485]
Backdoor attack aims at inducing neural models to make incorrect predictions for poison data while keeping predictions on the clean dataset unchanged.
In this paper, we propose an input-unique backdoor attack(NURA), where we generate backdoor triggers unique to inputs.
arXiv Detail & Related papers (2023-03-25T01:41:54Z) - BATT: Backdoor Attack with Transformation-based Triggers [72.61840273364311]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
Backdoor adversaries inject hidden backdoors that can be activated by adversary-specified trigger patterns.
One recent research revealed that most of the existing attacks failed in the real physical world.
arXiv Detail & Related papers (2022-11-02T16:03:43Z) - Backdoor Defense via Suppressing Model Shortcuts [91.30995749139012]
In this paper, we explore the backdoor mechanism from the angle of the model structure.
We demonstrate that the attack success rate (ASR) decreases significantly when reducing the outputs of some key skip connections.
arXiv Detail & Related papers (2022-11-02T15:39:19Z) - Imperceptible Backdoor Attack: From Input Space to Feature
Representation [24.82632240825927]
Backdoor attacks are rapidly emerging threats to deep neural networks (DNNs)
In this paper, we analyze the drawbacks of existing attack approaches and propose a novel imperceptible backdoor attack.
Our trigger only modifies less than 1% pixels of a benign image while the magnitude is 1.
arXiv Detail & Related papers (2022-05-06T13:02:26Z) - Check Your Other Door! Establishing Backdoor Attacks in the Frequency
Domain [80.24811082454367]
We show the advantages of utilizing the frequency domain for establishing undetectable and powerful backdoor attacks.
We also show two possible defences that succeed against frequency-based backdoor attacks and possible ways for the attacker to bypass them.
arXiv Detail & Related papers (2021-09-12T12:44:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.