Apple of Sodom: Hidden Backdoors in Superior Sentence Embeddings via
Contrastive Learning
- URL: http://arxiv.org/abs/2210.11082v1
- Date: Thu, 20 Oct 2022 08:19:18 GMT
- Title: Apple of Sodom: Hidden Backdoors in Superior Sentence Embeddings via
Contrastive Learning
- Authors: Xiaoyi Chen, Baisong Xin, Shengfang Zhai, Shiqing Ma, Qingni Shen and
Zhonghai Wu
- Abstract summary: We present the first backdoor attack framework, BadCSE, for state-of-the-art sentence embeddings.
We evaluate BadCSE on both STS tasks and other downstream tasks.
- Score: 17.864914834411092
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper finds that contrastive learning can produce superior sentence
embeddings for pre-trained models but is also vulnerable to backdoor attacks.
We present the first backdoor attack framework, BadCSE, for state-of-the-art
sentence embeddings under supervised and unsupervised learning settings. The
attack manipulates the construction of positive and negative pairs so that the
backdoored samples have a similar embedding with the target sample (targeted
attack) or the negative embedding of its clean version (non-targeted attack).
By injecting the backdoor in sentence embeddings, BadCSE is resistant against
downstream fine-tuning. We evaluate BadCSE on both STS tasks and other
downstream tasks. The supervised non-targeted attack obtains a performance
degradation of 194.86%, and the targeted attack maps the backdoored samples to
the target embedding with a 97.70% success rate while maintaining the model
utility.
Related papers
- Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats [52.94388672185062]
We propose an efficient defense mechanism against backdoor threats using a concept known as machine unlearning.
This entails strategically creating a small set of poisoned samples to aid the model's rapid unlearning of backdoor vulnerabilities.
In the backdoor unlearning process, we present a novel token-based portion unlearning training regime.
arXiv Detail & Related papers (2024-09-29T02:55:38Z) - Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor [63.84477483795964]
Data-poisoning backdoor attacks are serious security threats to machine learning models.
In this paper, we focus on in-training backdoor defense, aiming to train a clean model even when the dataset may be potentially poisoned.
We propose a novel defense approach called PDB (Proactive Defensive Backdoor)
arXiv Detail & Related papers (2024-05-25T07:52:26Z) - Does Few-shot Learning Suffer from Backdoor Attacks? [63.9864247424967]
We show that few-shot learning can still be vulnerable to backdoor attacks.
Our method demonstrates a high Attack Success Rate (ASR) in FSL tasks with different few-shot learning paradigms.
This study reveals that few-shot learning still suffers from backdoor attacks, and its security should be given attention.
arXiv Detail & Related papers (2023-12-31T06:43:36Z) - BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive
Learning [85.2564206440109]
This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses.
We introduce the emphtoolns attack, which is resistant to backdoor detection and model fine-tuning defenses.
arXiv Detail & Related papers (2023-11-20T02:21:49Z) - Beating Backdoor Attack at Its Own Game [10.131734154410763]
Deep neural networks (DNNs) are vulnerable to backdoor attack.
Existing defense methods have greatly reduced attack success rate.
We propose a highly effective framework which injects non-adversarial backdoors targeting poisoned samples.
arXiv Detail & Related papers (2023-07-28T13:07:42Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics.
We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z) - Backdoor Attack against NLP models with Robustness-Aware Perturbation
defense [0.0]
Backdoor attack intends to embed hidden backdoor into deep neural networks (DNNs)
In our work, we break this defense by controlling the robustness gap between poisoned and clean samples using adversarial training step.
arXiv Detail & Related papers (2022-04-08T10:08:07Z) - Adversarial Fine-tuning for Backdoor Defense: Connect Adversarial
Examples to Triggered Samples [15.57457705138278]
We propose a new Adversarial Fine-Tuning (AFT) approach to erase backdoor triggers.
AFT can effectively erase the backdoor triggers without obvious performance degradation on clean samples.
arXiv Detail & Related papers (2022-02-13T13:41:15Z) - Test-Time Detection of Backdoor Triggers for Poisoned Deep Neural
Networks [24.532269628999025]
Backdoor (Trojan) attacks are emerging threats against deep neural networks (DNN)
In this paper, we propose an "in-flight" defense against backdoor attacks on image classification.
arXiv Detail & Related papers (2021-12-06T20:52:00Z) - BadNL: Backdoor Attacks against NLP Models with Semantic-preserving
Improvements [33.309299864983295]
We propose BadNL, a general NLP backdoor attack framework including novel attack methods.
Our attacks achieve an almost perfect attack success rate with a negligible effect on the original model's utility.
arXiv Detail & Related papers (2020-06-01T16:17:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.