Backdoor Attacks against Transfer Learning with Pre-trained Deep
Learning Models
- URL: http://arxiv.org/abs/2001.03274v2
- Date: Thu, 12 Mar 2020 10:10:11 GMT
- Title: Backdoor Attacks against Transfer Learning with Pre-trained Deep
Learning Models
- Authors: Shuo Wang, Surya Nepal, Carsten Rudolph, Marthie Grobler, Shangyu
Chen, Tianle Chen
- Abstract summary: Transfer learning provides an effective solution for feasibly and fast customize accurate textitStudent models.
Many pre-trained Teacher models are publicly available and maintained by public platforms, increasing their vulnerability to backdoor attacks.
We demonstrate a backdoor threat to transfer learning tasks on both image and time-series data leveraging the knowledge of publicly accessible Teacher models.
- Score: 23.48763375455514
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transfer learning provides an effective solution for feasibly and fast
customize accurate \textit{Student} models, by transferring the learned
knowledge of pre-trained \textit{Teacher} models over large datasets via
fine-tuning. Many pre-trained Teacher models used in transfer learning are
publicly available and maintained by public platforms, increasing their
vulnerability to backdoor attacks. In this paper, we demonstrate a backdoor
threat to transfer learning tasks on both image and time-series data leveraging
the knowledge of publicly accessible Teacher models, aimed at defeating three
commonly-adopted defenses: \textit{pruning-based}, \textit{retraining-based}
and \textit{input pre-processing-based defenses}. Specifically, (A)
ranking-based selection mechanism to speed up the backdoor trigger generation
and perturbation process while defeating \textit{pruning-based} and/or
\textit{retraining-based defenses}. (B) autoencoder-powered trigger generation
is proposed to produce a robust trigger that can defeat the \textit{input
pre-processing-based defense}, while guaranteeing that selected neuron(s) can
be significantly activated. (C) defense-aware retraining to generate the
manipulated model using reverse-engineered model inputs.
We launch effective misclassification attacks on Student models over
real-world images, brain Magnetic Resonance Imaging (MRI) data and
Electrocardiography (ECG) learning systems. The experiments reveal that our
enhanced attack can maintain the $98.4\%$ and $97.2\%$ classification accuracy
as the genuine model on clean image and time series inputs respectively while
improving $27.9\%-100\%$ and $27.1\%-56.1\%$ attack success rate on trojaned
image and time series inputs respectively in the presence of pruning-based
and/or retraining-based defenses.
Related papers
- Undermining Image and Text Classification Algorithms Using Adversarial Attacks [0.0]
Our study addresses the gap by training various machine learning models and using GANs and SMOTE to generate additional data points aimed at attacking text classification models.
Our experiments reveal a significant vulnerability in classification models. Specifically, we observe a 20 % decrease in accuracy for the top-performing text classification models post-attack, along with a 30 % decrease in facial recognition accuracy.
arXiv Detail & Related papers (2024-11-03T18:44:28Z) - Backdoor in Seconds: Unlocking Vulnerabilities in Large Pre-trained Models via Model Editing [21.52641337754884]
A type of adversarial attack can manipulate the behavior of machine learning models through contaminating their training dataset.
We introduce our EDT model, an textbfEfficient, textbfData-free, textbfTraining-free backdoor attack method.
Inspired by model editing techniques, EDT injects an editing-based lightweight codebook into the backdoor of large pre-trained models.
arXiv Detail & Related papers (2024-10-23T20:32:14Z) - Adversarial Robustification via Text-to-Image Diffusion Models [56.37291240867549]
Adrial robustness has been conventionally believed as a challenging property to encode for neural networks.
We develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data.
arXiv Detail & Related papers (2024-07-26T10:49:14Z) - DLP: towards active defense against backdoor attacks with decoupled learning process [2.686336957004475]
We propose a general training pipeline to defend against backdoor attacks.
We show that the model shows different learning behaviors in clean and poisoned subsets during training.
The effectiveness of our approach has been shown in numerous experiments across various backdoor attacks and datasets.
arXiv Detail & Related papers (2024-06-18T23:04:38Z) - Manipulating and Mitigating Generative Model Biases without Retraining [49.60774626839712]
We propose a dynamic and computationally efficient manipulation of T2I model biases by exploiting their rich language embedding spaces without model retraining.
We show that leveraging foundational vector algebra allows for a convenient control over language model embeddings to shift T2I model outputs.
As a by-product, this control serves as a form of precise prompt engineering to generate images which are generally implausible using regular text prompts.
arXiv Detail & Related papers (2024-04-03T07:33:30Z) - Learn from the Past: A Proxy Guided Adversarial Defense Framework with
Self Distillation Regularization [53.04697800214848]
Adversarial Training (AT) is pivotal in fortifying the robustness of deep learning models.
AT methods, relying on direct iterative updates for target model's defense, frequently encounter obstacles such as unstable training and catastrophic overfitting.
We present a general proxy guided defense framework, LAST' (bf Learn from the Pbf ast)
arXiv Detail & Related papers (2023-10-19T13:13:41Z) - Protect Federated Learning Against Backdoor Attacks via Data-Free
Trigger Generation [25.072791779134]
Federated Learning (FL) enables large-scale clients to collaboratively train a model without sharing their raw data.
Due to the lack of data auditing for untrusted clients, FL is vulnerable to poisoning attacks, especially backdoor attacks.
We propose a novel data-free trigger-generation-based defense approach based on the two characteristics of backdoor attacks.
arXiv Detail & Related papers (2023-08-22T10:16:12Z) - Isolation and Induction: Training Robust Deep Neural Networks against
Model Stealing Attacks [51.51023951695014]
Existing model stealing defenses add deceptive perturbations to the victim's posterior probabilities to mislead the attackers.
This paper proposes Isolation and Induction (InI), a novel and effective training framework for model stealing defenses.
In contrast to adding perturbations over model predictions that harm the benign accuracy, we train models to produce uninformative outputs against stealing queries.
arXiv Detail & Related papers (2023-08-02T05:54:01Z) - Reconstructing Training Data with Informed Adversaries [30.138217209991826]
Given access to a machine learning model, can an adversary reconstruct the model's training data?
This work studies this question from the lens of a powerful informed adversary who knows all the training data points except one.
We show it is feasible to reconstruct the remaining data point in this stringent threat model.
arXiv Detail & Related papers (2022-01-13T09:19:25Z) - Learning to Learn Transferable Attack [77.67399621530052]
Transfer adversarial attack is a non-trivial black-box adversarial attack that aims to craft adversarial perturbations on the surrogate model and then apply such perturbations to the victim model.
We propose a Learning to Learn Transferable Attack (LLTA) method, which makes the adversarial perturbations more generalized via learning from both data and model augmentation.
Empirical results on the widely-used dataset demonstrate the effectiveness of our attack method with a 12.85% higher success rate of transfer attack compared with the state-of-the-art methods.
arXiv Detail & Related papers (2021-12-10T07:24:21Z) - How Robust are Randomized Smoothing based Defenses to Data Poisoning? [66.80663779176979]
We present a previously unrecognized threat to robust machine learning models that highlights the importance of training-data quality.
We propose a novel bilevel optimization-based data poisoning attack that degrades the robustness guarantees of certifiably robust classifiers.
Our attack is effective even when the victim trains the models from scratch using state-of-the-art robust training methods.
arXiv Detail & Related papers (2020-12-02T15:30:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.