Related papers: Architectural Backdoors in Neural Networks

Architectural Backdoors in Neural Networks

URL: http://arxiv.org/abs/2206.07840v1
Date: Wed, 15 Jun 2022 22:44:03 GMT
Title: Architectural Backdoors in Neural Networks
Authors: Mikel Bober-Irizar, Ilia Shumailov, Yiren Zhao, Robert Mullins, Nicolas Papernot
Abstract summary: We introduce a new class of backdoor attacks that hide inside model architectures. These backdoors are simple to implement, for instance by publishing open-source code for a backdoored model architecture. We demonstrate that model architectural backdoors represent a real threat and, unlike other approaches, can survive a complete re-training from scratch.
Score: 27.315196801989032
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Machine learning is vulnerable to adversarial manipulation. Previous literature has demonstrated that at the training stage attackers can manipulate data and data sampling procedures to control model behaviour. A common attack goal is to plant backdoors i.e. force the victim model to learn to recognise a trigger known only by the adversary. In this paper, we introduce a new class of backdoor attacks that hide inside model architectures i.e. in the inductive bias of the functions used to train. These backdoors are simple to implement, for instance by publishing open-source code for a backdoored model architecture that others will reuse unknowingly. We demonstrate that model architectural backdoors represent a real threat and, unlike other approaches, can survive a complete re-training from scratch. We formalise the main construction principles behind architectural backdoors, such as a link between the input and the output, and describe some possible protections against them. We evaluate our attacks on computer vision benchmarks of different scales and demonstrate the underlying vulnerability is pervasive in a variety of training settings.

Related papers

A Backdoor Attack Scheme with Invisible Triggers Based on Model Architecture Modification [12.393139669821869]
Traditional backdoor attacks involve injecting malicious samples with specific triggers into the training data. More sophisticated attacks modify the model's architecture directly. A novel backdoor attack method is presented in the paper. It embeds the backdoor within the model's architecture and has the capability to generate inconspicuous and stealthy triggers.
arXiv Detail & Related papers (2024-12-22T07:39:43Z)
Data Free Backdoor Attacks [83.10379074100453]
DFBA is a retraining-free and data-free backdoor attack without changing the model architecture. We verify that our injected backdoor is provably undetectable and unchosen by various state-of-the-art defenses. Our evaluation on multiple datasets demonstrates that our injected backdoor: 1) incurs negligible classification loss, 2) achieves 100% attack success rates, and 3) bypasses six existing state-of-the-art defenses.
arXiv Detail & Related papers (2024-12-09T05:30:25Z)
Expose Before You Defend: Unifying and Enhancing Backdoor Defenses via Exposed Models [68.40324627475499]
We introduce a novel two-step defense framework named Expose Before You Defend. EBYD unifies existing backdoor defense methods into a comprehensive defense system with enhanced performance. We conduct extensive experiments on 10 image attacks and 6 text attacks across 2 vision datasets and 4 language datasets.
arXiv Detail & Related papers (2024-10-25T09:36:04Z)
Exploiting the Vulnerability of Large Language Models via Defense-Aware Architectural Backdoor [0.24335447922683692]
We introduce a new type of backdoor attack that conceals itself within the underlying model architecture. The add-on modules of model architecture layers can detect the presence of input trigger tokens and modify layer weights. We conduct extensive experiments to evaluate our attack methods using two model architecture settings on five different large language datasets.
arXiv Detail & Related papers (2024-09-03T14:54:16Z)
Model Pairing Using Embedding Translation for Backdoor Attack Detection on Open-Set Classification Tasks [63.269788236474234]
We propose to use model pairs on open-set classification tasks for detecting backdoors. We show that this score, can be an indicator for the presence of a backdoor despite models being of different architectures. This technique allows for the detection of backdoors on models designed for open-set classification tasks, which is little studied in the literature.
arXiv Detail & Related papers (2024-02-28T21:29:16Z)
Architectural Neural Backdoors from First Principles [44.83442736206931]
architectural backdoors are backdoors embedded within the definition of the network's architecture. In this work we construct an arbitrary trigger detector which can be used to backdoor an architecture with no human supervision. We discuss defenses against architectural backdoors, emphasizing the need for robust and comprehensive strategies to safeguard the integrity of ML systems.
arXiv Detail & Related papers (2024-02-10T13:57:51Z)
PatchBackdoor: Backdoor Attack against Deep Neural Networks without Model Modification [0.0]
Backdoor attack is a major threat to deep learning systems in safety-critical scenarios. In this paper, we show that backdoor attacks can be achieved without any model modification. We implement PatchBackdoor in real-world scenarios and show that the attack is still threatening.
arXiv Detail & Related papers (2023-08-22T23:02:06Z)
Rethinking Backdoor Attacks [122.1008188058615]
In a backdoor attack, an adversary inserts maliciously constructed backdoor examples into a training set to make the resulting model vulnerable to manipulation. Defending against such attacks typically involves viewing these inserted examples as outliers in the training set and using techniques from robust statistics to detect and remove them. We show that without structural information about the training data distribution, backdoor attacks are indistinguishable from naturally-occurring features in the data.
arXiv Detail & Related papers (2023-07-19T17:44:54Z)
On the Effectiveness of Adversarial Training against Backdoor Attacks [111.8963365326168]
A backdoored model always predicts a target class in the presence of a predefined trigger pattern. In general, adversarial training is believed to defend against backdoor attacks. We propose a hybrid strategy which provides satisfactory robustness across different backdoor attacks.
arXiv Detail & Related papers (2022-02-22T02:24:46Z)
Check Your Other Door! Establishing Backdoor Attacks in the Frequency Domain [80.24811082454367]
We show the advantages of utilizing the frequency domain for establishing undetectable and powerful backdoor attacks. We also show two possible defences that succeed against frequency-based backdoor attacks and possible ways for the attacker to bypass them.
arXiv Detail & Related papers (2021-09-12T12:44:52Z)
Backdoor Learning: A Survey [75.59571756777342]
Backdoor attack intends to embed hidden backdoor into deep neural networks (DNNs) Backdoor learning is an emerging and rapidly growing research area. This paper presents the first comprehensive survey of this realm.
arXiv Detail & Related papers (2020-07-17T04:09:20Z)
Blind Backdoors in Deep Learning Models [22.844973592524966]
We investigate a new method for injecting backdoors into machine learning models, based on compromising the loss-value computation in the model-training code. We use it to demonstrate new classes of backdoors strictly more powerful than those in the prior literature. Our attack is blind: the attacker cannot modify the training data, nor observe the execution of his code, nor access the resulting model.
arXiv Detail & Related papers (2020-05-08T02:15:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.