Mudjacking: Patching Backdoor Vulnerabilities in Foundation Models
- URL: http://arxiv.org/abs/2402.14977v1
- Date: Thu, 22 Feb 2024 21:31:43 GMT
- Title: Mudjacking: Patching Backdoor Vulnerabilities in Foundation Models
- Authors: Hongbin Liu, Michael K. Reiter, Neil Zhenqiang Gong
- Abstract summary: Foundation models are vulnerable to backdoor attacks and a backdoored foundation model is a single-point-of-failure of the AI ecosystem.
We propose Mudjacking, the first method to patch foundation models to remove backdoors.
Our results show that Mudjacking can remove backdoor from a foundation model while maintaining its utility.
- Score: 55.038561766001514
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Foundation model has become the backbone of the AI ecosystem. In particular,
a foundation model can be used as a general-purpose feature extractor to build
various downstream classifiers. However, foundation models are vulnerable to
backdoor attacks and a backdoored foundation model is a single-point-of-failure
of the AI ecosystem, e.g., multiple downstream classifiers inherit the backdoor
vulnerabilities simultaneously. In this work, we propose Mudjacking, the first
method to patch foundation models to remove backdoors. Specifically, given a
misclassified trigger-embedded input detected after a backdoored foundation
model is deployed, Mudjacking adjusts the parameters of the foundation model to
remove the backdoor. We formulate patching a foundation model as an
optimization problem and propose a gradient descent based method to solve it.
We evaluate Mudjacking on both vision and language foundation models, eleven
benchmark datasets, five existing backdoor attacks, and thirteen adaptive
backdoor attacks. Our results show that Mudjacking can remove backdoor from a
foundation model while maintaining its utility.
Related papers
- TrojFM: Resource-efficient Backdoor Attacks against Very Large Foundation Models [69.37990698561299]
TrojFM is a novel backdoor attack tailored for very large foundation models.
Our approach injects backdoors by fine-tuning only a very small proportion of model parameters.
We demonstrate that TrojFM can launch effective backdoor attacks against widely used large GPT-style models.
arXiv Detail & Related papers (2024-05-27T03:10:57Z) - Model Pairing Using Embedding Translation for Backdoor Attack Detection
on Open-Set Classification Tasks [51.78558228584093]
We propose to use model pairs on open-set classification tasks for detecting backdoors.
We show that backdoors can be detected even when both models are backdoored.
arXiv Detail & Related papers (2024-02-28T21:29:16Z) - Architectural Neural Backdoors from First Principles [44.83442736206931]
architectural backdoors are backdoors embedded within the definition of the network's architecture.
In this work we construct an arbitrary trigger detector which can be used to backdoor an architecture with no human supervision.
We discuss defenses against architectural backdoors, emphasizing the need for robust and comprehensive strategies to safeguard the integrity of ML systems.
arXiv Detail & Related papers (2024-02-10T13:57:51Z) - BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input
Detection [42.021282816470794]
We present a novel defense, against backdoor attacks on Deep Neural Networks (DNNs)
Our defense falls within the category of post-development defenses that operate independently of how the model was generated.
We show the feasibility of devising highly accurate backdoor input detectors that filter out the backdoor inputs during model inference.
arXiv Detail & Related papers (2023-08-23T21:47:06Z) - PatchBackdoor: Backdoor Attack against Deep Neural Networks without
Model Modification [0.0]
Backdoor attack is a major threat to deep learning systems in safety-critical scenarios.
In this paper, we show that backdoor attacks can be achieved without any model modification.
We implement PatchBackdoor in real-world scenarios and show that the attack is still threatening.
arXiv Detail & Related papers (2023-08-22T23:02:06Z) - Single Image Backdoor Inversion via Robust Smoothed Classifiers [76.66635991456336]
We present a new approach for backdoor inversion, which is able to recover the hidden backdoor with as few as a single image.
In this work, we present a new approach for backdoor inversion, which is able to recover the hidden backdoor with as few as a single image.
arXiv Detail & Related papers (2023-03-01T03:37:42Z) - Backdoor Attacks on Federated Learning with Lottery Ticket Hypothesis [49.38856542573576]
Edge devices in federated learning usually have much more limited computation and communication resources compared to servers in a data center.
In this work, we empirically demonstrate that Lottery Ticket models are equally vulnerable to backdoor attacks as the original dense models.
arXiv Detail & Related papers (2021-09-22T04:19:59Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.