SynGhost: Invisible and Universal Task-agnostic Backdoor Attack via Syntactic Transfer
- URL: http://arxiv.org/abs/2402.18945v4
- Date: Mon, 03 Mar 2025 06:34:48 GMT
- Title: SynGhost: Invisible and Universal Task-agnostic Backdoor Attack via Syntactic Transfer
- Authors: Pengzhou Cheng, Wei Du, Zongru Wu, Fengwei Zhang, Libo Chen, Zhuosheng Zhang, Gongshen Liu,
- Abstract summary: Pre-training suffers from task-agnostic backdoor attacks due to vulnerabilities in data and training mechanisms.<n>We propose $mathttSynGhost$, an invisible and universal task-agnostic backdoor attack via syntactic transfer.<n>$mathttSynGhost$ adaptively selects optimal targets based on contrastive learning, creating a uniform distribution in the pre-training space.
- Score: 22.77860269955347
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Although pre-training achieves remarkable performance, it suffers from task-agnostic backdoor attacks due to vulnerabilities in data and training mechanisms. These attacks can transfer backdoors to various downstream tasks. In this paper, we introduce $\mathtt{maxEntropy}$, an entropy-based poisoning filter that mitigates such risks. To overcome the limitations of manual target setting and explicit triggers, we propose $\mathtt{SynGhost}$, an invisible and universal task-agnostic backdoor attack via syntactic transfer, further exposing vulnerabilities in pre-trained language models (PLMs). Specifically, $\mathtt{SynGhost}$ injects multiple syntactic backdoors into the pre-training space through corpus poisoning, while preserving the PLM's pre-training capabilities. Second, $\mathtt{SynGhost}$ adaptively selects optimal targets based on contrastive learning, creating a uniform distribution in the pre-training space. To identify syntactic differences, we also introduce an awareness module to minimize interference between backdoors. Experiments show that $\mathtt{SynGhost}$ poses significant threats and can transfer to various downstream tasks. Furthermore, $\mathtt{SynGhost}$ resists defenses based on perplexity, fine-pruning, and $\mathtt{maxEntropy}$. The code is available at https://github.com/Zhou-CyberSecurity-AI/SynGhost.
Related papers
- $\textit{Agents Under Siege}$: Breaking Pragmatic Multi-Agent LLM Systems with Optimized Prompt Attacks [32.42704787246349]
Multi-agent Large Language Model (LLM) systems create novel adversarial risks because their behavior depends on communication between agents and decentralized reasoning.
In this work, we innovatively focus on attacking pragmatic systems that have constrains such as limited token bandwidth, latency between message delivery, and defense mechanisms.
We design a $textitpermutation-invariant adversarial attack$ that optimize prompt distribution across latency and bandwidth-constraint network topologies to bypass distributed safety mechanisms.
arXiv Detail & Related papers (2025-03-31T20:43:56Z) - ELBA-Bench: An Efficient Learning Backdoor Attacks Benchmark for Large Language Models [55.93380086403591]
Generative large language models are vulnerable to backdoor attacks.
$textitELBA-Bench$ allows attackers to inject backdoor through parameter efficient fine-tuning.
$textitELBA-Bench$ provides over 1300 experiments.
arXiv Detail & Related papers (2025-02-22T12:55:28Z) - NoiseAttack: An Evasive Sample-Specific Multi-Targeted Backdoor Attack Through White Gaussian Noise [0.19820694575112383]
Backdoor attacks pose a significant threat when using third-party data for deep learning development.
We introduce a novel sample-specific multi-targeted backdoor attack, namely NoiseAttack.
This work is the first of its kind to launch a vision backdoor attack with the intent to generate multiple targeted classes.
arXiv Detail & Related papers (2024-09-03T19:24:46Z) - T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models [70.03122709795122]
We propose a comprehensive defense method named T2IShield to detect, localize, and mitigate backdoor attacks.
We find the "Assimilation Phenomenon" on the cross-attention maps caused by the backdoor trigger.
For backdoor sample detection, T2IShield achieves a detection F1 score of 88.9$%$ with low computational cost.
arXiv Detail & Related papers (2024-07-05T01:53:21Z) - Model Supply Chain Poisoning: Backdooring Pre-trained Models via Embedding Indistinguishability [61.549465258257115]
We propose a novel and severer backdoor attack, TransTroj, which enables the backdoors embedded in PTMs to efficiently transfer in the model supply chain.
Experimental results show that our method significantly outperforms SOTA task-agnostic backdoor attacks.
arXiv Detail & Related papers (2024-01-29T04:35:48Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - Backdoor Learning on Sequence to Sequence Models [94.23904400441957]
In this paper, we study whether sequence-to-sequence (seq2seq) models are vulnerable to backdoor attacks.
Specifically, we find by only injecting 0.2% samples of the dataset, we can cause the seq2seq model to generate the designated keyword and even the whole sentence.
Extensive experiments on machine translation and text summarization have been conducted to show our proposed methods could achieve over 90% attack success rate on multiple datasets and models.
arXiv Detail & Related papers (2023-05-03T20:31:13Z) - How many dimensions are required to find an adversarial example? [0.0]
We investigate how adversarial vulnerability depends on $dim(V)$.
In particular, we show that the adversarial success of standard PGD attacks with $ellp$ norm constraints behaves like a monotonically increasing function of $epsilon.
arXiv Detail & Related papers (2023-03-24T17:36:15Z) - BATT: Backdoor Attack with Transformation-based Triggers [72.61840273364311]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
Backdoor adversaries inject hidden backdoors that can be activated by adversary-specified trigger patterns.
One recent research revealed that most of the existing attacks failed in the real physical world.
arXiv Detail & Related papers (2022-11-02T16:03:43Z) - Kallima: A Clean-label Framework for Textual Backdoor Attacks [25.332731545200808]
We propose the first clean-label framework Kallima for synthesizing mimesis-style backdoor samples.
We modify inputs belonging to the target class with adversarial perturbations, making the model rely more on the backdoor trigger.
arXiv Detail & Related papers (2022-06-03T21:44:43Z) - Under-confidence Backdoors Are Resilient and Stealthy Backdoors [35.57996363193643]
backdoor attacks aim to make the victim model produce designed outputs on any input injected with pre-designed backdoors.
In order to achieve a high attack success rate, most existing attack methods change the labels of the poisoned samples to the target class.
This practice often results in severe over-fitting of the victim model over the backdoors, making the attack quite effective in output control but easier to be identified by human inspection or automatic defense algorithms.
arXiv Detail & Related papers (2022-02-19T01:31:41Z) - Qu-ANTI-zation: Exploiting Quantization Artifacts for Achieving
Adversarial Outcomes [5.865029600972316]
Quantization is a technique that transforms the parameter representation of a neural network from floating-point numbers into lower-precision ones.
We propose a new training framework to implement adversarial quantization outcomes.
We show that a single compromised model defeats multiple quantization schemes.
arXiv Detail & Related papers (2021-10-26T10:09:49Z) - Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger [48.59965356276387]
We propose to use syntactic structure as the trigger in textual backdoor attacks.
We conduct extensive experiments to demonstrate that the trigger-based attack method can achieve comparable attack performance.
These results also reveal the significant insidiousness and harmfulness of textual backdoor attacks.
arXiv Detail & Related papers (2021-05-26T08:54:19Z) - Hidden Backdoors in Human-Centric Language Models [12.694861859949585]
We create covert and natural triggers for textual backdoor attacks.
We deploy our hidden backdoors through two state-of-the-art trigger embedding methods.
We demonstrate that the proposed hidden backdoors can be effective across three downstream security-critical NLP tasks.
arXiv Detail & Related papers (2021-05-01T04:41:00Z) - Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data.
We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level.
Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z) - Towards Defending Multiple $\ell_p$-norm Bounded Adversarial
Perturbations via Gated Batch Normalization [120.99395850108422]
Existing adversarial defenses typically improve model robustness against individual specific perturbations.
Some recent methods improve model robustness against adversarial attacks in multiple $ell_p$ balls, but their performance against each perturbation type is still far from satisfactory.
We propose Gated Batch Normalization (GBN) to adversarially train a perturbation-invariant predictor for defending multiple $ell_p bounded adversarial perturbations.
arXiv Detail & Related papers (2020-12-03T02:26:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.