Don't Knock! Rowhammer at the Backdoor of DNN Models
- URL: http://arxiv.org/abs/2110.07683v3
- Date: Thu, 13 Apr 2023 20:39:15 GMT
- Title: Don't Knock! Rowhammer at the Backdoor of DNN Models
- Authors: M. Caner Tol, Saad Islam, Andrew J. Adiletta, Berk Sunar, Ziming Zhang
- Abstract summary: We present an end-to-end backdoor injection attack realized on actual hardware on a model using Rowhammer as the fault injection method.
We propose a novel network training algorithm based on constrained optimization to achieve a realistic backdoor injection attack in hardware.
- Score: 19.13129153353046
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: State-of-the-art deep neural networks (DNNs) have been proven to be
vulnerable to adversarial manipulation and backdoor attacks. Backdoored models
deviate from expected behavior on inputs with predefined triggers while
retaining performance on clean data. Recent works focus on software simulation
of backdoor injection during the inference phase by modifying network weights,
which we find often unrealistic in practice due to restrictions in hardware.
In contrast, in this work for the first time, we present an end-to-end
backdoor injection attack realized on actual hardware on a classifier model
using Rowhammer as the fault injection method. To this end, we first
investigate the viability of backdoor injection attacks in real-life
deployments of DNNs on hardware and address such practical issues in hardware
implementation from a novel optimization perspective. We are motivated by the
fact that vulnerable memory locations are very rare, device-specific, and
sparsely distributed. Consequently, we propose a novel network training
algorithm based on constrained optimization to achieve a realistic backdoor
injection attack in hardware. By modifying parameters uniformly across the
convolutional and fully-connected layers as well as optimizing the trigger
pattern together, we achieve state-of-the-art attack performance with fewer bit
flips. For instance, our method on a hardware-deployed ResNet-20 model trained
on CIFAR-10 achieves over 89% test accuracy and 92% attack success rate by
flipping only 10 out of 2.2 million bits.
Related papers
- Fault Injection and Safe-Error Attack for Extraction of Embedded Neural Network Models [1.2499537119440245]
We focus on embedded deep neural network models on 32-bit microcontrollers in the Internet of Things (IoT)
We propose a black-box approach to craft a successful attack set.
For a classical convolutional neural network, we successfully recover at least 90% of the most significant bits with about 1500 crafted inputs.
arXiv Detail & Related papers (2023-08-31T13:09:33Z) - Backdoor Mitigation by Correcting the Distribution of Neural Activations [30.554700057079867]
Backdoor (Trojan) attacks are an important type of adversarial exploit against deep neural networks (DNNs)
We analyze an important property of backdoor attacks: a successful attack causes an alteration in the distribution of internal layer activations for backdoor-trigger instances.
We propose an efficient and effective method that achieves post-training backdoor mitigation by correcting the distribution alteration.
arXiv Detail & Related papers (2023-08-18T22:52:29Z) - Backdoor Defense via Suppressing Model Shortcuts [91.30995749139012]
In this paper, we explore the backdoor mechanism from the angle of the model structure.
We demonstrate that the attack success rate (ASR) decreases significantly when reducing the outputs of some key skip connections.
arXiv Detail & Related papers (2022-11-02T15:39:19Z) - Versatile Weight Attack via Flipping Limited Bits [68.45224286690932]
We study a novel attack paradigm, which modifies model parameters in the deployment stage.
Considering the effectiveness and stealthiness goals, we provide a general formulation to perform the bit-flip based weight attack.
We present two cases of the general formulation with different malicious purposes, i.e., single sample attack (SSA) and triggered samples attack (TSA)
arXiv Detail & Related papers (2022-07-25T03:24:58Z) - Adversarial Robustness Assessment of NeuroEvolution Approaches [1.237556184089774]
We evaluate the robustness of models found by two NeuroEvolution approaches on the CIFAR-10 image classification task.
Our results show that when the evolved models are attacked with iterative methods, their accuracy usually drops to, or close to, zero.
Some of these techniques can exacerbate the perturbations added to the original inputs, potentially harming robustness.
arXiv Detail & Related papers (2022-07-12T10:40:19Z) - Imperceptible Backdoor Attack: From Input Space to Feature
Representation [24.82632240825927]
Backdoor attacks are rapidly emerging threats to deep neural networks (DNNs)
In this paper, we analyze the drawbacks of existing attack approaches and propose a novel imperceptible backdoor attack.
Our trigger only modifies less than 1% pixels of a benign image while the magnitude is 1.
arXiv Detail & Related papers (2022-05-06T13:02:26Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z) - Targeted Attack against Deep Neural Networks via Flipping Limited Weight
Bits [55.740716446995805]
We study a novel attack paradigm, which modifies model parameters in the deployment stage for malicious purposes.
Our goal is to misclassify a specific sample into a target class without any sample modification.
By utilizing the latest technique in integer programming, we equivalently reformulate this BIP problem as a continuous optimization problem.
arXiv Detail & Related papers (2021-02-21T03:13:27Z) - Cassandra: Detecting Trojaned Networks from Adversarial Perturbations [92.43879594465422]
In many cases, pre-trained models are sourced from vendors who may have disrupted the training pipeline to insert Trojan behaviors into the models.
We propose a method to verify if a pre-trained model is Trojaned or benign.
Our method captures fingerprints of neural networks in the form of adversarial perturbations learned from the network gradients.
arXiv Detail & Related papers (2020-07-28T19:00:40Z) - Scalable Backdoor Detection in Neural Networks [61.39635364047679]
Deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch.
We propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types.
In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.
arXiv Detail & Related papers (2020-06-10T04:12:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.