Invisible Clean-Label Backdoor Attacks for Generative Data Augmentation
- URL: http://arxiv.org/abs/2602.03316v1
- Date: Tue, 03 Feb 2026 09:46:37 GMT
- Title: Invisible Clean-Label Backdoor Attacks for Generative Data Augmentation
- Authors: Ting Xiang, Jinhui Zhao, Changjian Chen, Zhuo Tang,
- Abstract summary: InvLBA is an invisible clean-label backdoor attack method for generative data augmentation by latent perturbation.<n>We show that our method improves the attack success rate by 46.43% on average, with almost no reduction in clean accuracy and high robustness against SOTA defense methods.
- Score: 16.661549659223994
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the rapid advancement of image generative models, generative data augmentation has become an effective way to enrich training images, especially when only small-scale datasets are available. At the same time, in practical applications, generative data augmentation can be vulnerable to clean-label backdoor attacks, which aim to bypass human inspection. However, based on theoretical analysis and preliminary experiments, we observe that directly applying existing pixel-level clean-label backdoor attack methods (e.g., COMBAT) to generated images results in low attack success rates. This motivates us to move beyond pixel-level triggers and focus instead on the latent feature level. To this end, we propose InvLBA, an invisible clean-label backdoor attack method for generative data augmentation by latent perturbation. We theoretically prove that the generalization of the clean accuracy and attack success rates of InvLBA can be guaranteed. Experiments on multiple datasets show that our method improves the attack success rate by 46.43% on average, with almost no reduction in clean accuracy and high robustness against SOTA defense methods.
Related papers
- Breaking the Stealth-Potency Trade-off in Clean-Image Backdoors with Generative Trigger Optimization [6.783000267839024]
Clean-image backdoor attacks pose a significant threat to security-critical applications.<n>A critical flaw in existing methods is that the poison rate required for a successful attack induces a proportional, and thus noticeable, drop in Clean Accuracy (CA)<n>We introduce Generative Clean-Image Backdoors (GCB), a framework that uses a conditional InfoGAN to identify naturally occurring image features that can serve as potent and stealthy triggers.
arXiv Detail & Related papers (2025-11-10T15:37:44Z) - Sealing The Backdoor: Unlearning Adversarial Text Triggers In Diffusion Models Using Knowledge Distillation [3.54387829918311]
Adversaries can inject imperceptible textual triggers into training data, causing models to generate manipulated outputs.<n>We propose Self-Knowledge Distillation with Cross-Attention Guidance (SKD-CAG) to erase associations between adversarial text triggers and poisoned outputs.<n>Our method achieves removal accuracy 100% for pixel backdoors and 93% for style-based attacks, without sacrificing robustness or image fidelity.
arXiv Detail & Related papers (2025-08-20T00:57:21Z) - Fine-tuning is Not Fine: Mitigating Backdoor Attacks in GNNs with Limited Clean Data [51.745219224707384]
Graph Neural Networks (GNNs) have achieved remarkable performance through their message-passing mechanism.<n>Recent studies have highlighted the vulnerability of GNNs to backdoor attacks.<n>In this paper, we propose a practical backdoor mitigation framework, denoted as GRAPHNAD.
arXiv Detail & Related papers (2025-01-10T10:16:35Z) - Long-Tailed Backdoor Attack Using Dynamic Data Augmentation Operations [50.1394620328318]
Existing backdoor attacks mainly focus on balanced datasets.
We propose an effective backdoor attack named Dynamic Data Augmentation Operation (D$2$AO)
Our method can achieve the state-of-the-art attack performance while preserving the clean accuracy.
arXiv Detail & Related papers (2024-10-16T18:44:22Z) - SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks [53.28390057407576]
Modern NLP models are often trained on public datasets drawn from diverse sources.
Data poisoning attacks can manipulate the model's behavior in ways engineered by the attacker.
Several strategies have been proposed to mitigate the risks associated with backdoor attacks.
arXiv Detail & Related papers (2024-05-19T14:50:09Z) - Protecting Model Adaptation from Trojans in the Unlabeled Data [120.42853706967188]
This paper explores the potential trojan attacks on model adaptation launched by well-designed poisoning target data.<n>We propose a plug-and-play method named DiffAdapt, which can be seamlessly integrated with existing adaptation algorithms.
arXiv Detail & Related papers (2024-01-11T16:42:10Z) - DataElixir: Purifying Poisoned Dataset to Mitigate Backdoor Attacks via
Diffusion Models [12.42597979026873]
We propose DataElixir, a novel sanitization approach tailored to purify poisoned datasets.
We leverage diffusion models to eliminate trigger features and restore benign features, thereby turning the poisoned samples into benign ones.
Experiments conducted on 9 popular attacks demonstrates that DataElixir effectively mitigates various complex attacks while exerting minimal impact on benign accuracy.
arXiv Detail & Related papers (2023-12-18T09:40:38Z) - DFB: A Data-Free, Low-Budget, and High-Efficacy Clean-Label Backdoor
Attack [3.6881739970725995]
Our research introduces DFB, a data-free, low-budget, and high-efficacy clean-label backdoor Attack.
Tested on CIFAR10, Tiny-ImageNet, and TSRD, DFB demonstrates remarkable efficacy with minimal poisoning rates of just 0.1%, 0.025%, and 0.4%, respectively.
arXiv Detail & Related papers (2023-08-18T11:49:33Z) - Robust Contrastive Language-Image Pre-training against Data Poisoning
and Backdoor Attacks [52.26631767748843]
We propose ROCLIP, the first effective method for robust pre-training multimodal vision-language models against targeted data poisoning and backdoor attacks.
ROCLIP effectively breaks the association between poisoned image-caption pairs by considering a relatively large and varying pool of random captions.
Our experiments show that ROCLIP renders state-of-the-art targeted data poisoning and backdoor attacks ineffective during pre-training CLIP models.
arXiv Detail & Related papers (2023-03-13T04:49:46Z) - Invisible Backdoor Attacks Using Data Poisoning in the Frequency Domain [8.64369418938889]
We propose a generalized backdoor attack method based on the frequency domain.
It can implement backdoor implantation without mislabeling and accessing the training process.
We evaluate our approach in the no-label and clean-label cases on three datasets.
arXiv Detail & Related papers (2022-07-09T07:05:53Z) - How Robust are Randomized Smoothing based Defenses to Data Poisoning? [66.80663779176979]
We present a previously unrecognized threat to robust machine learning models that highlights the importance of training-data quality.
We propose a novel bilevel optimization-based data poisoning attack that degrades the robustness guarantees of certifiably robust classifiers.
Our attack is effective even when the victim trains the models from scratch using state-of-the-art robust training methods.
arXiv Detail & Related papers (2020-12-02T15:30:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.