Backdoor in Seconds: Unlocking Vulnerabilities in Large Pre-trained Models via Model Editing
- URL: http://arxiv.org/abs/2410.18267v2
- Date: Fri, 25 Oct 2024 23:13:32 GMT
- Title: Backdoor in Seconds: Unlocking Vulnerabilities in Large Pre-trained Models via Model Editing
- Authors: Dongliang Guo, Mengxuan Hu, Zihan Guan, Junfeng Guo, Thomas Hartvigsen, Sheng Li,
- Abstract summary: A type of adversarial attack can manipulate the behavior of machine learning models through contaminating their training dataset.
We introduce our EDT model, an textbfEfficient, textbfData-free, textbfTraining-free backdoor attack method.
Inspired by model editing techniques, EDT injects an editing-based lightweight codebook into the backdoor of large pre-trained models.
- Score: 21.52641337754884
- License:
- Abstract: Large pre-trained models have achieved notable success across a range of downstream tasks. However, recent research shows that a type of adversarial attack ($\textit{i.e.,}$ backdoor attack) can manipulate the behavior of machine learning models through contaminating their training dataset, posing significant threat in the real-world application of large pre-trained model, especially for those customized models. Therefore, addressing the unique challenges for exploring vulnerability of pre-trained models is of paramount importance. Through empirical studies on the capability for performing backdoor attack in large pre-trained models ($\textit{e.g.,}$ ViT), we find the following unique challenges of attacking large pre-trained models: 1) the inability to manipulate or even access large training datasets, and 2) the substantial computational resources required for training or fine-tuning these models. To address these challenges, we establish new standards for an effective and feasible backdoor attack in the context of large pre-trained models. In line with these standards, we introduce our EDT model, an \textbf{E}fficient, \textbf{D}ata-free, \textbf{T}raining-free backdoor attack method. Inspired by model editing techniques, EDT injects an editing-based lightweight codebook into the backdoor of large pre-trained models, which replaces the embedding of the poisoned image with the target image without poisoning the training dataset or training the victim model. Our experiments, conducted across various pre-trained models such as ViT, CLIP, BLIP, and stable diffusion, and on downstream tasks including image classification, image captioning, and image generation, demonstrate the effectiveness of our method. Our code is available in the supplementary material.
Related papers
- Downstream Transfer Attack: Adversarial Attacks on Downstream Models with Pre-trained Vision Transformers [95.22517830759193]
This paper studies the transferability of such an adversarial vulnerability from a pre-trained ViT model to downstream tasks.
We show that DTA achieves an average attack success rate (ASR) exceeding 90%, surpassing existing methods by a huge margin.
arXiv Detail & Related papers (2024-08-03T08:07:03Z) - Adversarial Robustification via Text-to-Image Diffusion Models [56.37291240867549]
Adrial robustness has been conventionally believed as a challenging property to encode for neural networks.
We develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data.
arXiv Detail & Related papers (2024-07-26T10:49:14Z) - Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models [112.48136829374741]
In this paper, we unveil a new vulnerability: the privacy backdoor attack.
When a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model.
Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
arXiv Detail & Related papers (2024-04-01T16:50:54Z) - Towards Scalable and Robust Model Versioning [30.249607205048125]
Malicious incursions aimed at gaining access to deep learning models are on the rise.
We show how to generate multiple versions of a model that possess different attack properties.
We show theoretically that this can be accomplished by incorporating parameterized hidden distributions into the model training data.
arXiv Detail & Related papers (2024-01-17T19:55:49Z) - Matching Pairs: Attributing Fine-Tuned Models to their Pre-Trained Large
Language Models [11.57282859281814]
We consider different knowledge levels and attribution strategies, and find that we can correctly trace back 8 out of the 10 fine tuned models with our best method.
arXiv Detail & Related papers (2023-06-15T17:42:48Z) - CANIFE: Crafting Canaries for Empirical Privacy Measurement in Federated
Learning [77.27443885999404]
Federated Learning (FL) is a setting for training machine learning models in distributed environments.
We propose a novel method, CANIFE, that uses carefully crafted samples by a strong adversary to evaluate the empirical privacy of a training round.
arXiv Detail & Related papers (2022-10-06T13:30:16Z) - Bridging Pre-trained Models and Downstream Tasks for Source Code
Understanding [13.65914588243695]
We propose an approach to bridge pre-trained models and code-related tasks.
We exploit semantic-preserving transformation to enrich downstream data diversity.
We introduce curriculum learning to organize the transformed data in an easy-to-hard manner to fine-tune existing pre-trained models.
arXiv Detail & Related papers (2021-12-04T07:21:28Z) - bert2BERT: Towards Reusable Pretrained Language Models [51.078081486422896]
We propose bert2BERT, which can effectively transfer the knowledge of an existing smaller pre-trained model to a large model.
bert2BERT saves about 45% and 47% computational cost of pre-training BERT_BASE and GPT_BASE by reusing the models of almost their half sizes.
arXiv Detail & Related papers (2021-10-14T04:05:25Z) - DaST: Data-free Substitute Training for Adversarial Attacks [55.76371274622313]
We propose a data-free substitute training method (DaST) to obtain substitute models for adversarial black-box attacks.
To achieve this, DaST utilizes specially designed generative adversarial networks (GANs) to train the substitute models.
Experiments demonstrate the substitute models can achieve competitive performance compared with the baseline models.
arXiv Detail & Related papers (2020-03-28T04:28:13Z) - Backdoor Attacks against Transfer Learning with Pre-trained Deep
Learning Models [23.48763375455514]
Transfer learning provides an effective solution for feasibly and fast customize accurate textitStudent models.
Many pre-trained Teacher models are publicly available and maintained by public platforms, increasing their vulnerability to backdoor attacks.
We demonstrate a backdoor threat to transfer learning tasks on both image and time-series data leveraging the knowledge of publicly accessible Teacher models.
arXiv Detail & Related papers (2020-01-10T01:31:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.