The Ripple Effect: On Unforeseen Complications of Backdoor Attacks
- URL: http://arxiv.org/abs/2505.11586v1
- Date: Fri, 16 May 2025 17:59:53 GMT
- Title: The Ripple Effect: On Unforeseen Complications of Backdoor Attacks
- Authors: Rui Zhang, Yun Shen, Hongwei Li, Wenbo Jiang, Hanxiao Chen, Yuan Zhang, Guowen Xu, Yang Zhang,
- Abstract summary: We demonstrate the widespread presence of backdoor complications in downstream models fine-tuned from backdoored PTLMs.<n>We propose a backdoor complication reduction method leveraging multi-task learning to mitigate complications without prior knowledge of downstream tasks.
- Score: 37.537336956451576
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent research highlights concerns about the trustworthiness of third-party Pre-Trained Language Models (PTLMs) due to potential backdoor attacks. These backdoored PTLMs, however, are effective only for specific pre-defined downstream tasks. In reality, these PTLMs can be adapted to many other unrelated downstream tasks. Such adaptation may lead to unforeseen consequences in downstream model outputs, consequently raising user suspicion and compromising attack stealthiness. We refer to this phenomenon as backdoor complications. In this paper, we undertake the first comprehensive quantification of backdoor complications. Through extensive experiments using 4 prominent PTLMs and 16 text classification benchmark datasets, we demonstrate the widespread presence of backdoor complications in downstream models fine-tuned from backdoored PTLMs. The output distribution of triggered samples significantly deviates from that of clean samples. Consequently, we propose a backdoor complication reduction method leveraging multi-task learning to mitigate complications without prior knowledge of downstream tasks. The experimental results demonstrate that our proposed method can effectively reduce complications while maintaining the efficacy and consistency of backdoor attacks. Our code is available at https://github.com/zhangrui4041/Backdoor_Complications.
Related papers
- Simulate and Eliminate: Revoke Backdoors for Generative Large Language Models [42.19147076519423]
generative large language models (LLMs) dominate various Natural Language Processing (NLP) tasks from understanding to reasoning.<n>A malicious adversary may publish poisoned data online and conduct backdoor attacks on the victim LLMs pre-trained on the poisoned data.<n>We present Simulate and Eliminate (SANDE) to erase the undesired backdoored mappings for generative LLMs.
arXiv Detail & Related papers (2024-05-13T11:53:42Z) - LOTUS: Evasive and Resilient Backdoor Attacks through Sub-Partitioning [49.174341192722615]
Backdoor attack poses a significant security threat to Deep Learning applications.
Recent papers have introduced attacks using sample-specific invisible triggers crafted through special transformation functions.
We introduce a novel backdoor attack LOTUS to address both evasiveness and resilience.
arXiv Detail & Related papers (2024-03-25T21:01:29Z) - Test-Time Backdoor Attacks on Multimodal Large Language Models [41.601029747738394]
We present AnyDoor, a test-time backdoor attack against multimodal large language models (MLLMs)
AnyDoor employs similar techniques used in universal adversarial attacks, but distinguishes itself by its ability to decouple the timing of setup and activation of harmful effects.
arXiv Detail & Related papers (2024-02-13T16:28:28Z) - Model Supply Chain Poisoning: Backdooring Pre-trained Models via Embedding Indistinguishability [61.549465258257115]
We propose a novel and severer backdoor attack, TransTroj, which enables the backdoors embedded in PTMs to efficiently transfer in the model supply chain.<n> Experimental results show that our method significantly outperforms SOTA task-agnostic backdoor attacks.
arXiv Detail & Related papers (2024-01-29T04:35:48Z) - Setting the Trap: Capturing and Defeating Backdoors in Pretrained
Language Models through Honeypots [68.84056762301329]
Recent research has exposed the susceptibility of pretrained language models (PLMs) to backdoor attacks.
We propose and integrate a honeypot module into the original PLM to absorb backdoor information exclusively.
Our design is motivated by the observation that lower-layer representations in PLMs carry sufficient backdoor features.
arXiv Detail & Related papers (2023-10-28T08:21:16Z) - Backdoor Learning on Sequence to Sequence Models [94.23904400441957]
In this paper, we study whether sequence-to-sequence (seq2seq) models are vulnerable to backdoor attacks.
Specifically, we find by only injecting 0.2% samples of the dataset, we can cause the seq2seq model to generate the designated keyword and even the whole sentence.
Extensive experiments on machine translation and text summarization have been conducted to show our proposed methods could achieve over 90% attack success rate on multiple datasets and models.
arXiv Detail & Related papers (2023-05-03T20:31:13Z) - Backdoor Defense via Suppressing Model Shortcuts [91.30995749139012]
In this paper, we explore the backdoor mechanism from the angle of the model structure.
We demonstrate that the attack success rate (ASR) decreases significantly when reducing the outputs of some key skip connections.
arXiv Detail & Related papers (2022-11-02T15:39:19Z) - Technical Report: Assisting Backdoor Federated Learning with Whole
Population Knowledge Alignment [4.87359365320076]
Single-shot backdoor attack achieves high accuracy on both the main task and backdoor sub-task when injected at the FL model convergence.
We propose a two-phase backdoor attack, which includes a preliminary phase for the subsequent backdoor attack.
Benefiting from the preliminary phase, the later injected backdoor achieves better effectiveness as the backdoor effect will be less likely to be diluted by the normal model updates.
arXiv Detail & Related papers (2022-07-25T16:38:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.