Related papers: Breaking the Stealth-Potency Trade-off in Clean-Image Backdoors with Generative Trigger Optimization

Breaking the Stealth-Potency Trade-off in Clean-Image Backdoors with Generative Trigger Optimization

URL: http://arxiv.org/abs/2511.07210v2
Date: Wed, 12 Nov 2025 01:24:48 GMT
Title: Breaking the Stealth-Potency Trade-off in Clean-Image Backdoors with Generative Trigger Optimization
Authors: Binyan Xu, Fan Yang, Di Tang, Xilin Dai, Kehuan Zhang,
Abstract summary: Clean-image backdoor attacks pose a significant threat to security-critical applications.<n>A critical flaw in existing methods is that the poison rate required for a successful attack induces a proportional, and thus noticeable, drop in Clean Accuracy (CA)<n>We introduce Generative Clean-Image Backdoors (GCB), a framework that uses a conditional InfoGAN to identify naturally occurring image features that can serve as potent and stealthy triggers.
Score: 6.783000267839024
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Clean-image backdoor attacks, which use only label manipulation in training datasets to compromise deep neural networks, pose a significant threat to security-critical applications. A critical flaw in existing methods is that the poison rate required for a successful attack induces a proportional, and thus noticeable, drop in Clean Accuracy (CA), undermining their stealthiness. This paper presents a new paradigm for clean-image attacks that minimizes this accuracy degradation by optimizing the trigger itself. We introduce Generative Clean-Image Backdoors (GCB), a framework that uses a conditional InfoGAN to identify naturally occurring image features that can serve as potent and stealthy triggers. By ensuring these triggers are easily separable from benign task-related features, GCB enables a victim model to learn the backdoor from an extremely small set of poisoned examples, resulting in a CA drop of less than 1%. Our experiments demonstrate GCB's remarkable versatility, successfully adapting to six datasets, five architectures, and four tasks, including the first demonstration of clean-image backdoors in regression and segmentation. GCB also exhibits resilience against most of the existing backdoor defenses.

Related papers

IU: Imperceptible Universal Backdoor Attack [10.117347558468166]
We introduce a novel imperceptible universal backdoor attack that simultaneously controls all target classes with minimal poisoning while preserving stealth.<n>Our key idea is to leverage graph convolutional networks (GCNs) to model inter-class relationships and generate class-specific perturbations that are both effective and visually invisible.
arXiv Detail & Related papers (2026-02-28T15:34:59Z)
Self-Purification Mitigates Backdoors in Multimodal Diffusion Language Models [74.1970982768771]
We show that well-established data-poisoning pipelines can successfully implant backdoors into MDLMs.<n>We introduce a backdoor defense framework for MDLMs named DiSP (Diffusion Self-Purification)
arXiv Detail & Related papers (2026-02-24T15:47:52Z)
Invisible Clean-Label Backdoor Attacks for Generative Data Augmentation [16.661549659223994]
InvLBA is an invisible clean-label backdoor attack method for generative data augmentation by latent perturbation.<n>We show that our method improves the attack success rate by 46.43% on average, with almost no reduction in clean accuracy and high robustness against SOTA defense methods.
arXiv Detail & Related papers (2026-02-03T09:46:37Z)
Poison Once, Control Anywhere: Clean-Text Visual Backdoors in VLM-based Mobile Agents [54.35629963816521]
This work introduces VIBMA, the first clean-text backdoor attack targeting VLM-based mobile agents.<n>The attack injects malicious behaviors into the model by modifying only the visual input.<n>We show that our attack achieves high success rates while preserving clean-task behavior.
arXiv Detail & Related papers (2025-06-16T08:09:32Z)
InverTune: Removing Backdoors from Multimodal Contrastive Learning Models via Trigger Inversion and Activation Tuning [36.56302680556252]
We introduce InverTune, the first backdoor defense framework for multimodal models under minimal attacker assumptions.<n>InverTune effectively identifies and removes backdoor artifacts through three key components, achieving robust protection against backdoor attacks.<n> Experimental results show that InverTune reduces the average attack success rate (ASR) by 97.87% against the state-of-the-art (SOTA) attacks.
arXiv Detail & Related papers (2025-06-14T09:08:34Z)
Neural Antidote: Class-Wise Prompt Tuning for Purifying Backdoors in CLIP [51.04452017089568]
Class-wise Backdoor Prompt Tuning (CBPT) is an efficient and effective defense mechanism that operates on text prompts to indirectly purify CLIP.<n>CBPT significantly mitigates backdoor threats while preserving model utility.
arXiv Detail & Related papers (2025-02-26T16:25:15Z)
Unified Neural Backdoor Removal with Only Few Clean Samples through Unlearning and Relearning [4.623498459985644]
We propose ULRL (UnLearn and ReLearn for backdoor removal), a novel two-phase approach for comprehensive backdoor removal.<n>Our method first employs an unlearning phase, in which the network's loss is intentionally maximized on a small clean dataset.<n>In the relearning phase, these suspicious neurons are recalibrated using targeted reinitialization and cosine similarity regularization.
arXiv Detail & Related papers (2024-05-23T16:49:09Z)
SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks [53.28390057407576]
Modern NLP models are often trained on public datasets drawn from diverse sources. Data poisoning attacks can manipulate the model's behavior in ways engineered by the attacker. Several strategies have been proposed to mitigate the risks associated with backdoor attacks.
arXiv Detail & Related papers (2024-05-19T14:50:09Z)
Invisible Backdoor Attack Through Singular Value Decomposition [2.681558084723648]
backdoor attacks pose a serious security threat to deep neural networks (DNNs) To make triggers less perceptible and imperceptible, various invisible backdoor attacks have been proposed. This paper proposes an invisible backdoor attack called DEBA.
arXiv Detail & Related papers (2024-03-18T13:25:12Z)
Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks. backdoor attack is an emerging yet threatening training-phase threat. We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z)
Few-shot Backdoor Defense Using Shapley Estimation [123.56934991060788]
We develop a new approach called Shapley Pruning to mitigate backdoor attacks on deep neural networks. ShapPruning identifies the few infected neurons (under 1% of all neurons) and manages to protect the model's structure and accuracy. Experiments demonstrate the effectiveness and robustness of our method against various attacks and tasks.
arXiv Detail & Related papers (2021-12-30T02:27:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.