Towards Sample-specific Backdoor Attack with Clean Labels via Attribute
Trigger
- URL: http://arxiv.org/abs/2312.04584v2
- Date: Mon, 11 Dec 2023 03:12:36 GMT
- Title: Towards Sample-specific Backdoor Attack with Clean Labels via Attribute
Trigger
- Authors: Yiming Li, Mingyan Zhu, Junfeng Guo, Tao Wei, Shu-Tao Xia, Zhan Qin
- Abstract summary: We show that sample-specific backdoor attacks ( SSBAs) are not sufficiently stealthy due to their poisoned-label nature.
We propose to exploit content-relevant features, $a.k.a.$ (human-relied) attributes, as the trigger patterns to design clean-label SSBAs.
- Score: 60.91713802579101
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Currently, sample-specific backdoor attacks (SSBAs) are the most advanced and
malicious methods since they can easily circumvent most of the current backdoor
defenses. In this paper, we reveal that SSBAs are not sufficiently stealthy due
to their poisoned-label nature, where users can discover anomalies if they
check the image-label relationship. In particular, we demonstrate that it is
ineffective to directly generalize existing SSBAs to their clean-label variants
by poisoning samples solely from the target class. We reveal that it is
primarily due to two reasons, including \textbf{(1)} the `antagonistic effects'
of ground-truth features and \textbf{(2)} the learning difficulty of
sample-specific features. Accordingly, trigger-related features of existing
SSBAs cannot be effectively learned under the clean-label setting due to their
mild trigger intensity required for ensuring stealthiness. We argue that the
intensity constraint of existing SSBAs is mostly because their trigger patterns
are `content-irrelevant' and therefore act as `noises' for both humans and
DNNs. Motivated by this understanding, we propose to exploit content-relevant
features, $a.k.a.$ (human-relied) attributes, as the trigger patterns to design
clean-label SSBAs. This new attack paradigm is dubbed backdoor attack with
attribute trigger (BAAT). Extensive experiments are conducted on benchmark
datasets, which verify the effectiveness of our BAAT and its resistance to
existing defenses.
Related papers
- CBPF: Filtering Poisoned Data Based on Composite Backdoor Attack [11.815603563125654]
This paper explores strategies for mitigating the risks associated with backdoor attacks by examining the filtration of poisoned samples.
A novel three-stage poisoning data filtering approach, known as Composite Backdoor Poison Filtering (CBPF), is proposed as an effective solution.
arXiv Detail & Related papers (2024-06-23T14:37:24Z) - Lazy Layers to Make Fine-Tuned Diffusion Models More Traceable [70.77600345240867]
A novel arbitrary-in-arbitrary-out (AIAO) strategy makes watermarks resilient to fine-tuning-based removal.
Unlike the existing methods of designing a backdoor for the input/output space of diffusion models, in our method, we propose to embed the backdoor into the feature space of sampled subpaths.
Our empirical studies on the MS-COCO, AFHQ, LSUN, CUB-200, and DreamBooth datasets confirm the robustness of AIAO.
arXiv Detail & Related papers (2024-05-01T12:03:39Z) - Shortcuts Arising from Contrast: Effective and Covert Clean-Label Attacks in Prompt-Based Learning [40.130762098868736]
We propose a method named Contrastive Shortcut Injection (CSI), by leveraging activation values, integrates trigger design and data selection strategies to craft stronger shortcut features.
With extensive experiments on full-shot and few-shot text classification tasks, we empirically validate CSI's high effectiveness and high stealthiness at low poisoning rates.
arXiv Detail & Related papers (2024-03-30T20:02:36Z) - Does Few-shot Learning Suffer from Backdoor Attacks? [63.9864247424967]
We show that few-shot learning can still be vulnerable to backdoor attacks.
Our method demonstrates a high Attack Success Rate (ASR) in FSL tasks with different few-shot learning paradigms.
This study reveals that few-shot learning still suffers from backdoor attacks, and its security should be given attention.
arXiv Detail & Related papers (2023-12-31T06:43:36Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - SATBA: An Invisible Backdoor Attack Based On Spatial Attention [7.405457329942725]
Backdoor attacks involve the training of Deep Neural Network (DNN) on datasets that contain hidden trigger patterns.
Most existing backdoor attacks suffer from two significant drawbacks: their trigger patterns are visible and easy to detect by backdoor defense or even human inspection.
We propose a novel backdoor attack named SATBA that overcomes these limitations using spatial attention and an U-net based model.
arXiv Detail & Related papers (2023-02-25T10:57:41Z) - CASSOCK: Viable Backdoor Attacks against DNN in The Wall of
Source-Specific Backdoor Defences [29.84771472633627]
Backdoor attacks have been a critical threat to deep neural network (DNN)
Most existing countermeasures focus on source-agnostic backdoor attacks (SABAs) and fail to defeat source-specific backdoor attacks ( SSBAs)
arXiv Detail & Related papers (2022-05-31T23:09:35Z) - On Trace of PGD-Like Adversarial Attacks [77.75152218980605]
Adversarial attacks pose safety and security concerns for deep learning applications.
We construct Adrial Response Characteristics (ARC) features to reflect the model's gradient consistency.
Our method is intuitive, light-weighted, non-intrusive, and data-undemanding.
arXiv Detail & Related papers (2022-05-19T14:26:50Z) - Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data.
We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level.
Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.