Kallima: A Clean-label Framework for Textual Backdoor Attacks
- URL: http://arxiv.org/abs/2206.01832v1
- Date: Fri, 3 Jun 2022 21:44:43 GMT
- Title: Kallima: A Clean-label Framework for Textual Backdoor Attacks
- Authors: Xiaoyi Chen, Yinpeng Dong, Zeyu Sun, Shengfang Zhai, Qingni Shen,
Zhonghai Wu
- Abstract summary: We propose the first clean-label framework Kallima for synthesizing mimesis-style backdoor samples.
We modify inputs belonging to the target class with adversarial perturbations, making the model rely more on the backdoor trigger.
- Score: 25.332731545200808
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Although Deep Neural Network (DNN) has led to unprecedented progress in
various natural language processing (NLP) tasks, research shows that deep
models are extremely vulnerable to backdoor attacks. The existing backdoor
attacks mainly inject a small number of poisoned samples into the training
dataset with the labels changed to the target one. Such mislabeled samples
would raise suspicion upon human inspection, potentially revealing the attack.
To improve the stealthiness of textual backdoor attacks, we propose the first
clean-label framework Kallima for synthesizing mimesis-style backdoor samples
to develop insidious textual backdoor attacks. We modify inputs belonging to
the target class with adversarial perturbations, making the model rely more on
the backdoor trigger. Our framework is compatible with most existing backdoor
triggers. The experimental results on three benchmark datasets demonstrate the
effectiveness of the proposed method.
Related papers
- UltraClean: A Simple Framework to Train Robust Neural Networks against Backdoor Attacks [19.369701116838776]
Backdoor attacks are emerging threats to deep neural networks.
They typically embed malicious behaviors into a victim model by injecting poisoned samples.
We propose UltraClean, a framework that simplifies the identification of poisoned samples.
arXiv Detail & Related papers (2023-12-17T09:16:17Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics.
We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z) - BATT: Backdoor Attack with Transformation-based Triggers [72.61840273364311]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
Backdoor adversaries inject hidden backdoors that can be activated by adversary-specified trigger patterns.
One recent research revealed that most of the existing attacks failed in the real physical world.
arXiv Detail & Related papers (2022-11-02T16:03:43Z) - Adversarial Fine-tuning for Backdoor Defense: Connect Adversarial
Examples to Triggered Samples [15.57457705138278]
We propose a new Adversarial Fine-Tuning (AFT) approach to erase backdoor triggers.
AFT can effectively erase the backdoor triggers without obvious performance degradation on clean samples.
arXiv Detail & Related papers (2022-02-13T13:41:15Z) - Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger [48.59965356276387]
We propose to use syntactic structure as the trigger in textual backdoor attacks.
We conduct extensive experiments to demonstrate that the trigger-based attack method can achieve comparable attack performance.
These results also reveal the significant insidiousness and harmfulness of textual backdoor attacks.
arXiv Detail & Related papers (2021-05-26T08:54:19Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z) - Backdoor Learning: A Survey [75.59571756777342]
Backdoor attack intends to embed hidden backdoor into deep neural networks (DNNs)
Backdoor learning is an emerging and rapidly growing research area.
This paper presents the first comprehensive survey of this realm.
arXiv Detail & Related papers (2020-07-17T04:09:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.