Related papers: UNICORN: A Unified Backdoor Trigger Inversion Framework

UNICORN: A Unified Backdoor Trigger Inversion Framework

URL: http://arxiv.org/abs/2304.02786v1
Date: Wed, 5 Apr 2023 23:14:08 GMT
Title: UNICORN: A Unified Backdoor Trigger Inversion Framework
Authors: Zhenting Wang, Kai Mei, Juan Zhai, Shiqing Ma
Abstract summary: Trigger inversion is an effective way of identifying backdoor models and understanding embedded adversarial behaviors. This work formally defines and analyzes the triggers injected in different spaces and the inversion problem. Then, it proposes a unified framework to invert backdoor triggers based on the formalization of triggers and the identified inner behaviors of backdoor models.
Score: 13.841110859970827
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The backdoor attack, where the adversary uses inputs stamped with triggers (e.g., a patch) to activate pre-planted malicious behaviors, is a severe threat to Deep Neural Network (DNN) models. Trigger inversion is an effective way of identifying backdoor models and understanding embedded adversarial behaviors. A challenge of trigger inversion is that there are many ways of constructing the trigger. Existing methods cannot generalize to various types of triggers by making certain assumptions or attack-specific constraints. The fundamental reason is that existing work does not consider the trigger's design space in their formulation of the inversion problem. This work formally defines and analyzes the triggers injected in different spaces and the inversion problem. Then, it proposes a unified framework to invert backdoor triggers based on the formalization of triggers and the identified inner behaviors of backdoor models from our analysis. Our prototype UNICORN is general and effective in inverting backdoor triggers in DNNs. The code can be found at https://github.com/RU-System-Software-and-Security/UNICORN.

Related papers

A4O: All Trigger for One sample [10.78460062665304]
We show that proposed backdoor defenders often rely on the assumption that triggers would appear in a unified way. In this paper, we show that this naive assumption can create a loophole, allowing more sophisticated backdoor attacks to bypass. We design a novel backdoor attack mechanism that incorporates multiple types of backdoor triggers, focusing on stealthiness and effectiveness.
arXiv Detail & Related papers (2025-01-13T10:38:58Z)
PureDiffusion: Using Backdoor to Counter Backdoor in Generative Diffusion Models [5.957580737396457]
Diffusion models (DMs) are advanced deep learning models that achieved state-of-the-art capability on a wide range of generative tasks. Recent studies have shown their vulnerability regarding backdoor attacks, in which backdoored DMs consistently generate a designated result called backdoor target. We introduce PureDiffusion, a novel backdoor defense framework that can efficiently detect backdoor attacks by inverting backdoor triggers embedded in DMs.
arXiv Detail & Related papers (2024-09-20T23:19:26Z)
Eliminating Backdoors in Neural Code Models via Trigger Inversion [24.053091055319562]
Backdoor attacks on neural code models pose a significant security threat. In this paper, we propose a backdoor defense technique based on trigger inversion, called EliBadCode. We show that EliBadCode can effectively eliminate backdoors while having minimal adverse effects on the normal functionality of the model.
arXiv Detail & Related papers (2024-08-08T08:23:03Z)
Evolutionary Trigger Detection and Lightweight Model Repair Based Backdoor Defense [10.310546695762467]
Deep Neural Networks (DNNs) have been widely used in many areas such as autonomous driving and face recognition. A backdoor in the DNN model can be activated by a poisoned input with trigger and leads to wrong prediction. We propose an efficient backdoor defense based on evolutionary trigger detection and lightweight model repair.
arXiv Detail & Related papers (2024-07-07T14:50:59Z)
LOTUS: Evasive and Resilient Backdoor Attacks through Sub-Partitioning [49.174341192722615]
Backdoor attack poses a significant security threat to Deep Learning applications. Recent papers have introduced attacks using sample-specific invisible triggers crafted through special transformation functions. We introduce a novel backdoor attack LOTUS to address both evasiveness and resilience.
arXiv Detail & Related papers (2024-03-25T21:01:29Z)
Shortcuts Everywhere and Nowhere: Exploring Multi-Trigger Backdoor Attacks [64.68741192761726]
Backdoor attacks have become a significant threat to the pre-training and deployment of deep neural networks (DNNs) In this study, we explore the concept of Multi-Trigger Backdoor Attacks (MTBAs), where multiple adversaries leverage different types of triggers to poison the same dataset.
arXiv Detail & Related papers (2024-01-27T04:49:37Z)
From Shortcuts to Triggers: Backdoor Defense with Denoised PoE [51.287157951953226]
Language models are often at risk of diverse backdoor attacks, especially data poisoning. Existing backdoor defense methods mainly focus on backdoor attacks with explicit triggers. We propose an end-to-end ensemble-based backdoor defense framework, DPoE, to defend various backdoor attacks.
arXiv Detail & Related papers (2023-05-24T08:59:25Z)
Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks. backdoor attack is an emerging yet threatening training-phase threat. We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z)
Backdoor Attacks with Input-unique Triggers in NLP [34.98477726215485]
Backdoor attack aims at inducing neural models to make incorrect predictions for poison data while keeping predictions on the clean dataset unchanged. In this paper, we propose an input-unique backdoor attack(NURA), where we generate backdoor triggers unique to inputs.
arXiv Detail & Related papers (2023-03-25T01:41:54Z)
BATT: Backdoor Attack with Transformation-based Triggers [72.61840273364311]
Deep neural networks (DNNs) are vulnerable to backdoor attacks. Backdoor adversaries inject hidden backdoors that can be activated by adversary-specified trigger patterns. One recent research revealed that most of the existing attacks failed in the real physical world.
arXiv Detail & Related papers (2022-11-02T16:03:43Z)
Backdoor Defense via Suppressing Model Shortcuts [91.30995749139012]
In this paper, we explore the backdoor mechanism from the angle of the model structure. We demonstrate that the attack success rate (ASR) decreases significantly when reducing the outputs of some key skip connections.
arXiv Detail & Related papers (2022-11-02T15:39:19Z)
Imperceptible Backdoor Attack: From Input Space to Feature Representation [24.82632240825927]
Backdoor attacks are rapidly emerging threats to deep neural networks (DNNs) In this paper, we analyze the drawbacks of existing attack approaches and propose a novel imperceptible backdoor attack. Our trigger only modifies less than 1% pixels of a benign image while the magnitude is 1.
arXiv Detail & Related papers (2022-05-06T13:02:26Z)
Check Your Other Door! Establishing Backdoor Attacks in the Frequency Domain [80.24811082454367]
We show the advantages of utilizing the frequency domain for establishing undetectable and powerful backdoor attacks. We also show two possible defences that succeed against frequency-based backdoor attacks and possible ways for the attacker to bypass them.
arXiv Detail & Related papers (2021-09-12T12:44:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.