Architectural Backdoors in Neural Networks
- URL: http://arxiv.org/abs/2206.07840v1
- Date: Wed, 15 Jun 2022 22:44:03 GMT
- Title: Architectural Backdoors in Neural Networks
- Authors: Mikel Bober-Irizar, Ilia Shumailov, Yiren Zhao, Robert Mullins,
Nicolas Papernot
- Abstract summary: We introduce a new class of backdoor attacks that hide inside model architectures.
These backdoors are simple to implement, for instance by publishing open-source code for a backdoored model architecture.
We demonstrate that model architectural backdoors represent a real threat and, unlike other approaches, can survive a complete re-training from scratch.
- Score: 27.315196801989032
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning is vulnerable to adversarial manipulation. Previous
literature has demonstrated that at the training stage attackers can manipulate
data and data sampling procedures to control model behaviour. A common attack
goal is to plant backdoors i.e. force the victim model to learn to recognise a
trigger known only by the adversary. In this paper, we introduce a new class of
backdoor attacks that hide inside model architectures i.e. in the inductive
bias of the functions used to train. These backdoors are simple to implement,
for instance by publishing open-source code for a backdoored model architecture
that others will reuse unknowingly. We demonstrate that model architectural
backdoors represent a real threat and, unlike other approaches, can survive a
complete re-training from scratch. We formalise the main construction
principles behind architectural backdoors, such as a link between the input and
the output, and describe some possible protections against them. We evaluate
our attacks on computer vision benchmarks of different scales and demonstrate
the underlying vulnerability is pervasive in a variety of training settings.
Related papers
- A Backdoor Attack Scheme with Invisible Triggers Based on Model Architecture Modification [12.393139669821869]
Traditional backdoor attacks involve injecting malicious samples with specific triggers into the training data.
More sophisticated attacks modify the model's architecture directly.
A novel backdoor attack method is presented in the paper.
It embeds the backdoor within the model's architecture and has the capability to generate inconspicuous and stealthy triggers.
arXiv Detail & Related papers (2024-12-22T07:39:43Z) - Data Free Backdoor Attacks [83.10379074100453]
DFBA is a retraining-free and data-free backdoor attack without changing the model architecture.
We verify that our injected backdoor is provably undetectable and unchosen by various state-of-the-art defenses.
Our evaluation on multiple datasets demonstrates that our injected backdoor: 1) incurs negligible classification loss, 2) achieves 100% attack success rates, and 3) bypasses six existing state-of-the-art defenses.
arXiv Detail & Related papers (2024-12-09T05:30:25Z) - Exploiting the Vulnerability of Large Language Models via Defense-Aware Architectural Backdoor [0.24335447922683692]
We introduce a new type of backdoor attack that conceals itself within the underlying model architecture.
The add-on modules of model architecture layers can detect the presence of input trigger tokens and modify layer weights.
We conduct extensive experiments to evaluate our attack methods using two model architecture settings on five different large language datasets.
arXiv Detail & Related papers (2024-09-03T14:54:16Z) - Model Pairing Using Embedding Translation for Backdoor Attack Detection on Open-Set Classification Tasks [63.269788236474234]
We propose to use model pairs on open-set classification tasks for detecting backdoors.
We show that this score, can be an indicator for the presence of a backdoor despite models being of different architectures.
This technique allows for the detection of backdoors on models designed for open-set classification tasks, which is little studied in the literature.
arXiv Detail & Related papers (2024-02-28T21:29:16Z) - Architectural Neural Backdoors from First Principles [44.83442736206931]
architectural backdoors are backdoors embedded within the definition of the network's architecture.
In this work we construct an arbitrary trigger detector which can be used to backdoor an architecture with no human supervision.
We discuss defenses against architectural backdoors, emphasizing the need for robust and comprehensive strategies to safeguard the integrity of ML systems.
arXiv Detail & Related papers (2024-02-10T13:57:51Z) - Rethinking Backdoor Attacks [122.1008188058615]
In a backdoor attack, an adversary inserts maliciously constructed backdoor examples into a training set to make the resulting model vulnerable to manipulation.
Defending against such attacks typically involves viewing these inserted examples as outliers in the training set and using techniques from robust statistics to detect and remove them.
We show that without structural information about the training data distribution, backdoor attacks are indistinguishable from naturally-occurring features in the data.
arXiv Detail & Related papers (2023-07-19T17:44:54Z) - On the Effectiveness of Adversarial Training against Backdoor Attacks [111.8963365326168]
A backdoored model always predicts a target class in the presence of a predefined trigger pattern.
In general, adversarial training is believed to defend against backdoor attacks.
We propose a hybrid strategy which provides satisfactory robustness across different backdoor attacks.
arXiv Detail & Related papers (2022-02-22T02:24:46Z) - Check Your Other Door! Establishing Backdoor Attacks in the Frequency
Domain [80.24811082454367]
We show the advantages of utilizing the frequency domain for establishing undetectable and powerful backdoor attacks.
We also show two possible defences that succeed against frequency-based backdoor attacks and possible ways for the attacker to bypass them.
arXiv Detail & Related papers (2021-09-12T12:44:52Z) - Backdoor Learning: A Survey [75.59571756777342]
Backdoor attack intends to embed hidden backdoor into deep neural networks (DNNs)
Backdoor learning is an emerging and rapidly growing research area.
This paper presents the first comprehensive survey of this realm.
arXiv Detail & Related papers (2020-07-17T04:09:20Z) - Blind Backdoors in Deep Learning Models [22.844973592524966]
We investigate a new method for injecting backdoors into machine learning models, based on compromising the loss-value computation in the model-training code.
We use it to demonstrate new classes of backdoors strictly more powerful than those in the prior literature.
Our attack is blind: the attacker cannot modify the training data, nor observe the execution of his code, nor access the resulting model.
arXiv Detail & Related papers (2020-05-08T02:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.