Mudjacking: Patching Backdoor Vulnerabilities in Foundation Models
- URL: http://arxiv.org/abs/2402.14977v1
- Date: Thu, 22 Feb 2024 21:31:43 GMT
- Title: Mudjacking: Patching Backdoor Vulnerabilities in Foundation Models
- Authors: Hongbin Liu, Michael K. Reiter, Neil Zhenqiang Gong
- Abstract summary: Foundation models are vulnerable to backdoor attacks and a backdoored foundation model is a single-point-of-failure of the AI ecosystem.
We propose Mudjacking, the first method to patch foundation models to remove backdoors.
Our results show that Mudjacking can remove backdoor from a foundation model while maintaining its utility.
- Score: 55.038561766001514
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Foundation model has become the backbone of the AI ecosystem. In particular,
a foundation model can be used as a general-purpose feature extractor to build
various downstream classifiers. However, foundation models are vulnerable to
backdoor attacks and a backdoored foundation model is a single-point-of-failure
of the AI ecosystem, e.g., multiple downstream classifiers inherit the backdoor
vulnerabilities simultaneously. In this work, we propose Mudjacking, the first
method to patch foundation models to remove backdoors. Specifically, given a
misclassified trigger-embedded input detected after a backdoored foundation
model is deployed, Mudjacking adjusts the parameters of the foundation model to
remove the backdoor. We formulate patching a foundation model as an
optimization problem and propose a gradient descent based method to solve it.
We evaluate Mudjacking on both vision and language foundation models, eleven
benchmark datasets, five existing backdoor attacks, and thirteen adaptive
backdoor attacks. Our results show that Mudjacking can remove backdoor from a
foundation model while maintaining its utility.
Related papers
- Data Free Backdoor Attacks [83.10379074100453]
DFBA is a retraining-free and data-free backdoor attack without changing the model architecture.
We verify that our injected backdoor is provably undetectable and unchosen by various state-of-the-art defenses.
Our evaluation on multiple datasets demonstrates that our injected backdoor: 1) incurs negligible classification loss, 2) achieves 100% attack success rates, and 3) bypasses six existing state-of-the-art defenses.
arXiv Detail & Related papers (2024-12-09T05:30:25Z) - Behavior Backdoor for Deep Learning Models [95.50787731231063]
We take the first step towards behavioral backdoor'' attack, which is defined as a behavior-triggered backdoor model training procedure.
We propose the first pipeline of implementing behavior backdoor, i.e., the Quantification Backdoor (QB) attack.
Experiments have been conducted on different models, datasets, and tasks, demonstrating the effectiveness of this novel backdoor attack.
arXiv Detail & Related papers (2024-12-02T10:54:02Z) - Expose Before You Defend: Unifying and Enhancing Backdoor Defenses via Exposed Models [68.40324627475499]
We introduce a novel two-step defense framework named Expose Before You Defend.
EBYD unifies existing backdoor defense methods into a comprehensive defense system with enhanced performance.
We conduct extensive experiments on 10 image attacks and 6 text attacks across 2 vision datasets and 4 language datasets.
arXiv Detail & Related papers (2024-10-25T09:36:04Z) - Exploiting the Vulnerability of Large Language Models via Defense-Aware Architectural Backdoor [0.24335447922683692]
We introduce a new type of backdoor attack that conceals itself within the underlying model architecture.
The add-on modules of model architecture layers can detect the presence of input trigger tokens and modify layer weights.
We conduct extensive experiments to evaluate our attack methods using two model architecture settings on five different large language datasets.
arXiv Detail & Related papers (2024-09-03T14:54:16Z) - Diff-Cleanse: Identifying and Mitigating Backdoor Attacks in Diffusion Models [3.134071086568745]
Diffusion models (DMs) are regarded as one of the most advanced generative models today.
Recent studies suggest that DMs are vulnerable to backdoor attacks.
This vulnerability poses substantial risks, including reputational damage to model owners.
We introduce Diff-Cleanse, a novel two-stage backdoor defense framework specifically designed for DMs.
arXiv Detail & Related papers (2024-07-31T03:54:41Z) - TrojFM: Resource-efficient Backdoor Attacks against Very Large Foundation Models [69.37990698561299]
TrojFM is a novel backdoor attack tailored for very large foundation models.
Our approach injects backdoors by fine-tuning only a very small proportion of model parameters.
We demonstrate that TrojFM can launch effective backdoor attacks against widely used large GPT-style models.
arXiv Detail & Related papers (2024-05-27T03:10:57Z) - Model Pairing Using Embedding Translation for Backdoor Attack Detection on Open-Set Classification Tasks [63.269788236474234]
We propose to use model pairs on open-set classification tasks for detecting backdoors.
We show that this score, can be an indicator for the presence of a backdoor despite models being of different architectures.
This technique allows for the detection of backdoors on models designed for open-set classification tasks, which is little studied in the literature.
arXiv Detail & Related papers (2024-02-28T21:29:16Z) - BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input
Detection [42.021282816470794]
We present a novel defense, against backdoor attacks on Deep Neural Networks (DNNs)
Our defense falls within the category of post-development defenses that operate independently of how the model was generated.
We show the feasibility of devising highly accurate backdoor input detectors that filter out the backdoor inputs during model inference.
arXiv Detail & Related papers (2023-08-23T21:47:06Z) - PatchBackdoor: Backdoor Attack against Deep Neural Networks without
Model Modification [0.0]
Backdoor attack is a major threat to deep learning systems in safety-critical scenarios.
In this paper, we show that backdoor attacks can be achieved without any model modification.
We implement PatchBackdoor in real-world scenarios and show that the attack is still threatening.
arXiv Detail & Related papers (2023-08-22T23:02:06Z) - Single Image Backdoor Inversion via Robust Smoothed Classifiers [76.66635991456336]
We present a new approach for backdoor inversion, which is able to recover the hidden backdoor with as few as a single image.
In this work, we present a new approach for backdoor inversion, which is able to recover the hidden backdoor with as few as a single image.
arXiv Detail & Related papers (2023-03-01T03:37:42Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.