Stealthy Backdoors as Compression Artifacts
- URL: http://arxiv.org/abs/2104.15129v1
- Date: Fri, 30 Apr 2021 17:35:18 GMT
- Title: Stealthy Backdoors as Compression Artifacts
- Authors: Yulong Tian, Fnu Suya, Fengyuan Xu, David Evans
- Abstract summary: We study the risk that model compression could provide an opportunity for adversaries to inject stealthy backdoors.
We show this can be done for two common model compression techniques -- model pruning and model quantization.
- Score: 12.501709528606607
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In a backdoor attack on a machine learning model, an adversary produces a
model that performs well on normal inputs but outputs targeted
misclassifications on inputs containing a small trigger pattern. Model
compression is a widely-used approach for reducing the size of deep learning
models without much accuracy loss, enabling resource-hungry models to be
compressed for use on resource-constrained devices. In this paper, we study the
risk that model compression could provide an opportunity for adversaries to
inject stealthy backdoors. We design stealthy backdoor attacks such that the
full-sized model released by adversaries appears to be free from backdoors
(even when tested using state-of-the-art techniques), but when the model is
compressed it exhibits highly effective backdoors. We show this can be done for
two common model compression techniques -- model pruning and model
quantization. Our findings demonstrate how an adversary may be able to hide a
backdoor as a compression artifact, and show the importance of performing
security tests on the models that will actually be deployed not their
precompressed version.
Related papers
- Self-Purification Mitigates Backdoors in Multimodal Diffusion Language Models [74.1970982768771]
We show that well-established data-poisoning pipelines can successfully implant backdoors into MDLMs.<n>We introduce a backdoor defense framework for MDLMs named DiSP (Diffusion Self-Purification)
arXiv Detail & Related papers (2026-02-24T15:47:52Z) - Expose Before You Defend: Unifying and Enhancing Backdoor Defenses via Exposed Models [68.40324627475499]
We introduce a novel two-step defense framework named Expose Before You Defend.
EBYD unifies existing backdoor defense methods into a comprehensive defense system with enhanced performance.
We conduct extensive experiments on 10 image attacks and 6 text attacks across 2 vision datasets and 4 language datasets.
arXiv Detail & Related papers (2024-10-25T09:36:04Z) - TrojFM: Resource-efficient Backdoor Attacks against Very Large Foundation Models [69.37990698561299]
TrojFM is a novel backdoor attack tailored for very large foundation models.
Our approach injects backdoors by fine-tuning only a very small proportion of model parameters.
We demonstrate that TrojFM can launch effective backdoor attacks against widely used large GPT-style models.
arXiv Detail & Related papers (2024-05-27T03:10:57Z) - Model Pairing Using Embedding Translation for Backdoor Attack Detection on Open-Set Classification Tasks [63.269788236474234]
We propose to use model pairs on open-set classification tasks for detecting backdoors.
We show that this score, can be an indicator for the presence of a backdoor despite models being of different architectures.
This technique allows for the detection of backdoors on models designed for open-set classification tasks, which is little studied in the literature.
arXiv Detail & Related papers (2024-02-28T21:29:16Z) - Shared Adversarial Unlearning: Backdoor Mitigation by Unlearning Shared
Adversarial Examples [67.66153875643964]
Backdoor attacks are serious security threats to machine learning models.
In this paper, we explore the task of purifying a backdoored model using a small clean dataset.
By establishing the connection between backdoor risk and adversarial risk, we derive a novel upper bound for backdoor risk.
arXiv Detail & Related papers (2023-07-20T03:56:04Z) - Backdoor Attacks Against Deep Image Compression via Adaptive Frequency
Trigger [106.10954454667757]
We present a novel backdoor attack with multiple triggers against learned image compression models.
Motivated by the widely used discrete cosine transform (DCT) in existing compression systems and standards, we propose a frequency-based trigger injection model.
arXiv Detail & Related papers (2023-02-28T15:39:31Z) - Universal Soldier: Using Universal Adversarial Perturbations for
Detecting Backdoor Attacks [15.917794562400449]
A deep learning model may be poisoned by training with backdoored data or by modifying inner network parameters.
It is difficult to distinguish between clean and backdoored models without prior knowledge of the trigger.
We propose a novel method called Universal Soldier for Backdoor detection (USB) and reverse engineering potential backdoor triggers via UAPs.
arXiv Detail & Related papers (2023-02-01T20:47:58Z) - Fine-Tuning Is All You Need to Mitigate Backdoor Attacks [10.88508085229675]
We show that fine-tuning can effectively remove backdoors from machine learning models while maintaining high model utility.
We coin a new term, namely backdoor sequela, to measure the changes in model vulnerabilities to other attacks before and after the backdoor has been removed.
arXiv Detail & Related papers (2022-12-18T11:30:59Z) - Check Your Other Door! Establishing Backdoor Attacks in the Frequency
Domain [80.24811082454367]
We show the advantages of utilizing the frequency domain for establishing undetectable and powerful backdoor attacks.
We also show two possible defences that succeed against frequency-based backdoor attacks and possible ways for the attacker to bypass them.
arXiv Detail & Related papers (2021-09-12T12:44:52Z) - Can Adversarial Weight Perturbations Inject Neural Backdoors? [22.83199547214051]
Adversarial machine learning has exposed several security hazards of neural models.
We introduce adversarial perturbations in the model weights using a composite loss on the predictions of the original model.
Our results show that backdoors can be successfully injected with a very small average relative change in model weight values.
arXiv Detail & Related papers (2020-08-04T18:26:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.