Steganographic Backdoor Attacks in NLP: Ultra-Low Poisoning and Defense Evasion
- URL: http://arxiv.org/abs/2511.14301v2
- Date: Tue, 25 Nov 2025 07:42:59 GMT
- Title: Steganographic Backdoor Attacks in NLP: Ultra-Low Poisoning and Defense Evasion
- Authors: Eric Xue, Ruiyi Zhang, Zijun Zhang, Pengtao Xie,
- Abstract summary: Transformer models are foundational to natural language processing (NLP) applications, yet remain vulnerable to backdoor attacks.<n>We introduce SteganoBackdoor, bringing stealth techniques back into line with practical threat models.<n>SteganoBackdoor achieves over 99% attack success at an order-of-magnitude lower data-poisoning rate than prior approaches.
- Score: 33.35232947017276
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformer models are foundational to natural language processing (NLP) applications, yet remain vulnerable to backdoor attacks introduced through poisoned data, which implant hidden behaviors during training. To strengthen the ability to prevent such compromises, recent research has focused on designing increasingly stealthy attacks to stress-test existing defenses, pairing backdoor behaviors with stylized artifact or token-level perturbation triggers. However, this trend diverts attention from the harder and more realistic case: making the model respond to semantic triggers such as specific names or entities, where a successful backdoor could manipulate outputs tied to real people or events in deployed systems. Motivated by this growing disconnect, we introduce SteganoBackdoor, bringing stealth techniques back into line with practical threat models. Leveraging innocuous properties from natural-language steganography, SteganoBackdoor applies a gradient-guided data optimization process to transform semantic trigger seeds into steganographic carriers that embed a high backdoor payload, remain fluent, and exhibit no representational resemblance to the trigger. Across diverse experimental settings, SteganoBackdoor achieves over 99% attack success at an order-of-magnitude lower data-poisoning rate than prior approaches while maintaining unparalleled evasion against a comprehensive suite of data-level defenses. By revealing this practical and covert attack, SteganoBackdoor highlights an urgent blind spot in current defenses and demands immediate attention to adversarial data defenses and real-world threat modeling.
Related papers
- Proactive Adversarial Defense: Harnessing Prompt Tuning in Vision-Language Models to Detect Unseen Backdoored Images [0.0]
Backdoor attacks pose a critical threat by embedding hidden triggers into inputs, causing models to misclassify them into target labels.<n>We introduce a groundbreaking method to detect unseen backdoored images during both training and inference.<n>Our approach trains learnable text prompts to differentiate clean images from those with hidden backdoor triggers.
arXiv Detail & Related papers (2024-12-11T19:54:14Z) - SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks [53.28390057407576]
Modern NLP models are often trained on public datasets drawn from diverse sources.
Data poisoning attacks can manipulate the model's behavior in ways engineered by the attacker.
Several strategies have been proposed to mitigate the risks associated with backdoor attacks.
arXiv Detail & Related papers (2024-05-19T14:50:09Z) - BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive
Learning [85.2564206440109]
This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses.
We introduce the emphtoolns attack, which is resistant to backdoor detection and model fine-tuning defenses.
arXiv Detail & Related papers (2023-11-20T02:21:49Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics.
We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z) - Kallima: A Clean-label Framework for Textual Backdoor Attacks [25.332731545200808]
We propose the first clean-label framework Kallima for synthesizing mimesis-style backdoor samples.
We modify inputs belonging to the target class with adversarial perturbations, making the model rely more on the backdoor trigger.
arXiv Detail & Related papers (2022-06-03T21:44:43Z) - On the Effectiveness of Adversarial Training against Backdoor Attacks [111.8963365326168]
A backdoored model always predicts a target class in the presence of a predefined trigger pattern.
In general, adversarial training is believed to defend against backdoor attacks.
We propose a hybrid strategy which provides satisfactory robustness across different backdoor attacks.
arXiv Detail & Related papers (2022-02-22T02:24:46Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z) - WaNet -- Imperceptible Warping-based Backdoor Attack [20.289889150949836]
A third-party model can be poisoned in training to work well in normal conditions but behave maliciously when a trigger pattern appears.
In this paper, we propose using warping-based triggers to attack third-party models.
The proposed backdoor outperforms the previous methods in a human inspection test by a wide margin, proving its stealthiness.
arXiv Detail & Related papers (2021-02-20T15:25:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.