Backdoor Attacks on Pre-trained Models by Layerwise Weight Poisoning
- URL: http://arxiv.org/abs/2108.13888v1
- Date: Tue, 31 Aug 2021 14:47:37 GMT
- Title: Backdoor Attacks on Pre-trained Models by Layerwise Weight Poisoning
- Authors: Linyang Li, Demin Song, Xiaonan Li, Jiehang Zeng, Ruotian Ma, Xipeng
Qiu
- Abstract summary: Pre-trained weights can be maliciously poisoned with certain triggers.
Fine-tuned model will predict pre-defined labels, causing a security threat.
- Score: 27.391664788392
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: \textbf{P}re-\textbf{T}rained \textbf{M}odel\textbf{s} have been widely
applied and recently proved vulnerable under backdoor attacks: the released
pre-trained weights can be maliciously poisoned with certain triggers. When the
triggers are activated, even the fine-tuned model will predict pre-defined
labels, causing a security threat. These backdoors generated by the poisoning
methods can be erased by changing hyper-parameters during fine-tuning or
detected by finding the triggers. In this paper, we propose a stronger
weight-poisoning attack method that introduces a layerwise weight poisoning
strategy to plant deeper backdoors; we also introduce a combinatorial trigger
that cannot be easily detected. The experiments on text classification tasks
show that previous defense methods cannot resist our weight-poisoning method,
which indicates that our method can be widely applied and may provide hints for
future model robustness studies.
Related papers
- SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks [53.28390057407576]
Modern NLP models are often trained on public datasets drawn from diverse sources.
Data poisoning attacks can manipulate the model's behavior in ways engineered by the attacker.
Several strategies have been proposed to mitigate the risks associated with backdoor attacks.
arXiv Detail & Related papers (2024-05-19T14:50:09Z) - Defending Against Weight-Poisoning Backdoor Attacks for Parameter-Efficient Fine-Tuning [57.50274256088251]
We show that parameter-efficient fine-tuning (PEFT) is more susceptible to weight-poisoning backdoor attacks.
We develop a Poisoned Sample Identification Module (PSIM) leveraging PEFT, which identifies poisoned samples through confidence.
We conduct experiments on text classification tasks, five fine-tuning strategies, and three weight-poisoning backdoor attack methods.
arXiv Detail & Related papers (2024-02-19T14:22:54Z) - Poisoned Forgery Face: Towards Backdoor Attacks on Face Forgery
Detection [62.595450266262645]
This paper introduces a novel and previously unrecognized threat in face forgery detection scenarios caused by backdoor attack.
By embedding backdoors into models, attackers can deceive detectors into producing erroneous predictions for forged faces.
We propose emphPoisoned Forgery Face framework, which enables clean-label backdoor attacks on face forgery detectors.
arXiv Detail & Related papers (2024-02-18T06:31:05Z) - Attention-Enhancing Backdoor Attacks Against BERT-based Models [54.070555070629105]
Investigating the strategies of backdoor attacks will help to understand the model's vulnerability.
We propose a novel Trojan Attention Loss (TAL) which enhances the Trojan behavior by directly manipulating the attention patterns.
arXiv Detail & Related papers (2023-10-23T01:24:56Z) - ParaFuzz: An Interpretability-Driven Technique for Detecting Poisoned
Samples in NLP [29.375957205348115]
We propose an innovative test-time poisoned sample detection framework that hinges on the interpretability of model predictions.
We employ ChatGPT, a state-of-the-art large language model, as our paraphraser and formulate the trigger-removal task as a prompt engineering problem.
arXiv Detail & Related papers (2023-08-04T03:48:28Z) - Towards A Proactive ML Approach for Detecting Backdoor Poison Samples [38.21287048132065]
Adversaries can embed backdoors in deep learning models by introducing backdoor poison samples into training datasets.
In this work, we investigate how to detect such poison samples to mitigate the threat of backdoor attacks.
arXiv Detail & Related papers (2022-05-26T20:44:15Z) - Poisoned classifiers are not only backdoored, they are fundamentally
broken [84.67778403778442]
Under a commonly-studied backdoor poisoning attack against classification models, an attacker adds a small trigger to a subset of the training data.
It is often assumed that the poisoned classifier is vulnerable exclusively to the adversary who possesses the trigger.
In this paper, we show empirically that this view of backdoored classifiers is incorrect.
arXiv Detail & Related papers (2020-10-18T19:42:44Z) - Weight Poisoning Attacks on Pre-trained Models [103.19413805873585]
We show that it is possible to construct weight poisoning'' attacks where pre-trained weights are injected with vulnerabilities that expose backdoors'' after fine-tuning.
Our experiments on sentiment classification, toxicity detection, and spam detection show that this attack is widely applicable and poses a serious threat.
arXiv Detail & Related papers (2020-04-14T16:51:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.