UNICORN: A Unified Backdoor Trigger Inversion Framework
- URL: http://arxiv.org/abs/2304.02786v1
- Date: Wed, 5 Apr 2023 23:14:08 GMT
- Title: UNICORN: A Unified Backdoor Trigger Inversion Framework
- Authors: Zhenting Wang, Kai Mei, Juan Zhai, Shiqing Ma
- Abstract summary: Trigger inversion is an effective way of identifying backdoor models and understanding embedded adversarial behaviors.
This work formally defines and analyzes the triggers injected in different spaces and the inversion problem.
Then, it proposes a unified framework to invert backdoor triggers based on the formalization of triggers and the identified inner behaviors of backdoor models.
- Score: 13.841110859970827
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The backdoor attack, where the adversary uses inputs stamped with triggers
(e.g., a patch) to activate pre-planted malicious behaviors, is a severe threat
to Deep Neural Network (DNN) models. Trigger inversion is an effective way of
identifying backdoor models and understanding embedded adversarial behaviors. A
challenge of trigger inversion is that there are many ways of constructing the
trigger. Existing methods cannot generalize to various types of triggers by
making certain assumptions or attack-specific constraints. The fundamental
reason is that existing work does not consider the trigger's design space in
their formulation of the inversion problem. This work formally defines and
analyzes the triggers injected in different spaces and the inversion problem.
Then, it proposes a unified framework to invert backdoor triggers based on the
formalization of triggers and the identified inner behaviors of backdoor models
from our analysis. Our prototype UNICORN is general and effective in inverting
backdoor triggers in DNNs. The code can be found at
https://github.com/RU-System-Software-and-Security/UNICORN.
Related papers
- A4O: All Trigger for One sample [10.78460062665304]
We show that proposed backdoor defenders often rely on the assumption that triggers would appear in a unified way.
In this paper, we show that this naive assumption can create a loophole, allowing more sophisticated backdoor attacks to bypass.
We design a novel backdoor attack mechanism that incorporates multiple types of backdoor triggers, focusing on stealthiness and effectiveness.
arXiv Detail & Related papers (2025-01-13T10:38:58Z) - PureDiffusion: Using Backdoor to Counter Backdoor in Generative Diffusion Models [5.957580737396457]
Diffusion models (DMs) are advanced deep learning models that achieved state-of-the-art capability on a wide range of generative tasks.
Recent studies have shown their vulnerability regarding backdoor attacks, in which backdoored DMs consistently generate a designated result called backdoor target.
We introduce PureDiffusion, a novel backdoor defense framework that can efficiently detect backdoor attacks by inverting backdoor triggers embedded in DMs.
arXiv Detail & Related papers (2024-09-20T23:19:26Z) - Eliminating Backdoors in Neural Code Models for Secure Code Understanding [24.053091055319562]
Neural code models (NCMs) have been widely used to address various code understanding tasks, such as defect detection.
Backdoored NCMs function normally on normal/clean code snippets, but exhibit adversary-expected behavior on poisoned code snippets.
We propose EliBadCode to eliminate backdoors in NCMs by inverting/reverse-engineering and unlearning backdoor triggers.
arXiv Detail & Related papers (2024-08-08T08:23:03Z) - Evolutionary Trigger Detection and Lightweight Model Repair Based Backdoor Defense [10.310546695762467]
Deep Neural Networks (DNNs) have been widely used in many areas such as autonomous driving and face recognition.
A backdoor in the DNN model can be activated by a poisoned input with trigger and leads to wrong prediction.
We propose an efficient backdoor defense based on evolutionary trigger detection and lightweight model repair.
arXiv Detail & Related papers (2024-07-07T14:50:59Z) - LOTUS: Evasive and Resilient Backdoor Attacks through Sub-Partitioning [49.174341192722615]
Backdoor attack poses a significant security threat to Deep Learning applications.
Recent papers have introduced attacks using sample-specific invisible triggers crafted through special transformation functions.
We introduce a novel backdoor attack LOTUS to address both evasiveness and resilience.
arXiv Detail & Related papers (2024-03-25T21:01:29Z) - Shortcuts Everywhere and Nowhere: Exploring Multi-Trigger Backdoor Attacks [64.68741192761726]
Backdoor attacks have become a significant threat to the pre-training and deployment of deep neural networks (DNNs)
In this study, we explore the concept of Multi-Trigger Backdoor Attacks (MTBAs), where multiple adversaries leverage different types of triggers to poison the same dataset.
arXiv Detail & Related papers (2024-01-27T04:49:37Z) - From Shortcuts to Triggers: Backdoor Defense with Denoised PoE [51.287157951953226]
Language models are often at risk of diverse backdoor attacks, especially data poisoning.
Existing backdoor defense methods mainly focus on backdoor attacks with explicit triggers.
We propose an end-to-end ensemble-based backdoor defense framework, DPoE, to defend various backdoor attacks.
arXiv Detail & Related papers (2023-05-24T08:59:25Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - Backdoor Attacks with Input-unique Triggers in NLP [34.98477726215485]
Backdoor attack aims at inducing neural models to make incorrect predictions for poison data while keeping predictions on the clean dataset unchanged.
In this paper, we propose an input-unique backdoor attack(NURA), where we generate backdoor triggers unique to inputs.
arXiv Detail & Related papers (2023-03-25T01:41:54Z) - BATT: Backdoor Attack with Transformation-based Triggers [72.61840273364311]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
Backdoor adversaries inject hidden backdoors that can be activated by adversary-specified trigger patterns.
One recent research revealed that most of the existing attacks failed in the real physical world.
arXiv Detail & Related papers (2022-11-02T16:03:43Z) - Check Your Other Door! Establishing Backdoor Attacks in the Frequency
Domain [80.24811082454367]
We show the advantages of utilizing the frequency domain for establishing undetectable and powerful backdoor attacks.
We also show two possible defences that succeed against frequency-based backdoor attacks and possible ways for the attacker to bypass them.
arXiv Detail & Related papers (2021-09-12T12:44:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.