RSBA: Robust Statistical Backdoor Attack under Privilege-Constrained
Scenarios
- URL: http://arxiv.org/abs/2304.10985v2
- Date: Mon, 11 Mar 2024 17:14:40 GMT
- Title: RSBA: Robust Statistical Backdoor Attack under Privilege-Constrained
Scenarios
- Authors: Xiaolei Liu, Ming Yi, Kangyi Ding, Bangzhou Xin, Yixiao Xu, Li Yan,
Chao Shen
- Abstract summary: Learning-based systems have been demonstrated to be vulnerable to backdoor attacks.
In this paper, we introduce RSBA (Robust Statistical Backdoor Attack under Privilege-constrained scenarios)
We empirically and theoretically demonstrate the robustness of RSBA against image augmentations and model distillation.
- Score: 9.38518049643553
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning-based systems have been demonstrated to be vulnerable to backdoor
attacks, wherein malicious users manipulate model performance by injecting
backdoors into the target model and activating them with specific triggers.
Previous backdoor attack methods primarily focused on two key metrics: attack
success rate and stealthiness. However, these methods often necessitate
significant privileges over the target model, such as control over the training
process, making them challenging to implement in real-world scenarios.
Moreover, the robustness of existing backdoor attacks is not guaranteed, as
they prove sensitive to defenses such as image augmentations and model
distillation. In this paper, we address these two limitations and introduce
RSBA (Robust Statistical Backdoor Attack under Privilege-constrained
Scenarios). The key insight of RSBA is that statistical features can naturally
divide images into different groups, offering a potential implementation of
triggers. This type of trigger is more robust than manually designed ones, as
it is widely distributed in normal images. By leveraging these statistical
triggers, RSBA enables attackers to conduct black-box attacks by solely
poisoning the labels or the images. We empirically and theoretically
demonstrate the robustness of RSBA against image augmentations and model
distillation. Experimental results show that RSBA achieves a 99.83\% attack
success rate in black-box scenarios. Remarkably, it maintains a high success
rate even after model distillation, where attackers lack access to the training
dataset of the student model (1.39\% success rate for baseline methods on
average).
Related papers
- Revisiting Backdoor Attacks against Large Vision-Language Models [76.42014292255944]
This paper empirically examines the generalizability of backdoor attacks during the instruction tuning of LVLMs.
We modify existing backdoor attacks based on the above key observations.
This paper underscores that even simple traditional backdoor strategies pose a serious threat to LVLMs.
arXiv Detail & Related papers (2024-06-27T02:31:03Z) - SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks [53.28390057407576]
Modern NLP models are often trained on public datasets drawn from diverse sources.
Data poisoning attacks can manipulate the model's behavior in ways engineered by the attacker.
Several strategies have been proposed to mitigate the risks associated with backdoor attacks.
arXiv Detail & Related papers (2024-05-19T14:50:09Z) - Hijacking Attacks against Neural Networks by Analyzing Training Data [21.277867143827812]
CleanSheet is a new model hijacking attack that obtains the high performance of backdoor attacks without requiring the adversary to train the model.
CleanSheet exploits vulnerabilities in tampers stemming from the training data.
Results show that CleanSheet exhibits comparable to state-of-the-art backdoor attacks, achieving an average attack success rate (ASR) of 97.5% on CIFAR-100 and 92.4% on GTSRB.
arXiv Detail & Related papers (2024-01-18T05:48:56Z) - Can We Trust the Unlabeled Target Data? Towards Backdoor Attack and Defense on Model Adaptation [120.42853706967188]
We explore the potential backdoor attacks on model adaptation launched by well-designed poisoning target data.
We propose a plug-and-play method named MixAdapt, combining it with existing adaptation algorithms.
arXiv Detail & Related papers (2024-01-11T16:42:10Z) - DALA: A Distribution-Aware LoRA-Based Adversarial Attack against
Language Models [64.79319733514266]
Adversarial attacks can introduce subtle perturbations to input data.
Recent attack methods can achieve a relatively high attack success rate (ASR)
We propose a Distribution-Aware LoRA-based Adversarial Attack (DALA) method.
arXiv Detail & Related papers (2023-11-14T23:43:47Z) - Protect Federated Learning Against Backdoor Attacks via Data-Free
Trigger Generation [25.072791779134]
Federated Learning (FL) enables large-scale clients to collaboratively train a model without sharing their raw data.
Due to the lack of data auditing for untrusted clients, FL is vulnerable to poisoning attacks, especially backdoor attacks.
We propose a novel data-free trigger-generation-based defense approach based on the two characteristics of backdoor attacks.
arXiv Detail & Related papers (2023-08-22T10:16:12Z) - IMBERT: Making BERT Immune to Insertion-based Backdoor Attacks [45.81957796169348]
Backdoor attacks are an insidious security threat against machine learning models.
We introduce IMBERT, which uses either gradients or self-attention scores derived from victim models to self-defend against backdoor attacks.
Our empirical studies demonstrate that IMBERT can effectively identify up to 98.5% of inserted triggers.
arXiv Detail & Related papers (2023-05-25T22:08:57Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - Robust Contrastive Language-Image Pre-training against Data Poisoning
and Backdoor Attacks [52.26631767748843]
We propose ROCLIP, the first effective method for robust pre-training multimodal vision-language models against targeted data poisoning and backdoor attacks.
ROCLIP effectively breaks the association between poisoned image-caption pairs by considering a relatively large and varying pool of random captions.
Our experiments show that ROCLIP renders state-of-the-art targeted data poisoning and backdoor attacks ineffective during pre-training CLIP models.
arXiv Detail & Related papers (2023-03-13T04:49:46Z) - SATBA: An Invisible Backdoor Attack Based On Spatial Attention [7.405457329942725]
Backdoor attacks involve the training of Deep Neural Network (DNN) on datasets that contain hidden trigger patterns.
Most existing backdoor attacks suffer from two significant drawbacks: their trigger patterns are visible and easy to detect by backdoor defense or even human inspection.
We propose a novel backdoor attack named SATBA that overcomes these limitations using spatial attention and an U-net based model.
arXiv Detail & Related papers (2023-02-25T10:57:41Z) - The "Beatrix'' Resurrections: Robust Backdoor Detection via Gram
Matrices [24.173099352455083]
Deep Neural Networks (DNNs) are susceptible to backdoor attacks during training.
We propose a novel technique, Beatrix (backdoor detection via Gram matrix)
Our approach achieves an F1 score of 91.1% in detecting dynamic backdoors, while the state of the art can only reach 36.9%.
arXiv Detail & Related papers (2022-09-23T16:47:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.