Architectural Backdoors in Neural Networks
- URL: http://arxiv.org/abs/2206.07840v1
- Date: Wed, 15 Jun 2022 22:44:03 GMT
- Title: Architectural Backdoors in Neural Networks
- Authors: Mikel Bober-Irizar, Ilia Shumailov, Yiren Zhao, Robert Mullins,
Nicolas Papernot
- Abstract summary: We introduce a new class of backdoor attacks that hide inside model architectures.
These backdoors are simple to implement, for instance by publishing open-source code for a backdoored model architecture.
We demonstrate that model architectural backdoors represent a real threat and, unlike other approaches, can survive a complete re-training from scratch.
- Score: 27.315196801989032
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning is vulnerable to adversarial manipulation. Previous
literature has demonstrated that at the training stage attackers can manipulate
data and data sampling procedures to control model behaviour. A common attack
goal is to plant backdoors i.e. force the victim model to learn to recognise a
trigger known only by the adversary. In this paper, we introduce a new class of
backdoor attacks that hide inside model architectures i.e. in the inductive
bias of the functions used to train. These backdoors are simple to implement,
for instance by publishing open-source code for a backdoored model architecture
that others will reuse unknowingly. We demonstrate that model architectural
backdoors represent a real threat and, unlike other approaches, can survive a
complete re-training from scratch. We formalise the main construction
principles behind architectural backdoors, such as a link between the input and
the output, and describe some possible protections against them. We evaluate
our attacks on computer vision benchmarks of different scales and demonstrate
the underlying vulnerability is pervasive in a variety of training settings.
Related papers
- Expose Before You Defend: Unifying and Enhancing Backdoor Defenses via Exposed Models [68.40324627475499]
We introduce a novel two-step defense framework named Expose Before You Defend.
EBYD unifies existing backdoor defense methods into a comprehensive defense system with enhanced performance.
We conduct extensive experiments on 10 image attacks and 6 text attacks across 2 vision datasets and 4 language datasets.
arXiv Detail & Related papers (2024-10-25T09:36:04Z) - Exploiting the Vulnerability of Large Language Models via Defense-Aware Architectural Backdoor [0.24335447922683692]
We introduce a new type of backdoor attack that conceals itself within the underlying model architecture.
The add-on modules of model architecture layers can detect the presence of input trigger tokens and modify layer weights.
We conduct extensive experiments to evaluate our attack methods using two model architecture settings on five different large language datasets.
arXiv Detail & Related papers (2024-09-03T14:54:16Z) - Model Pairing Using Embedding Translation for Backdoor Attack Detection on Open-Set Classification Tasks [63.269788236474234]
We propose to use model pairs on open-set classification tasks for detecting backdoors.
We show that this score, can be an indicator for the presence of a backdoor despite models being of different architectures.
This technique allows for the detection of backdoors on models designed for open-set classification tasks, which is little studied in the literature.
arXiv Detail & Related papers (2024-02-28T21:29:16Z) - Architectural Neural Backdoors from First Principles [44.83442736206931]
architectural backdoors are backdoors embedded within the definition of the network's architecture.
In this work we construct an arbitrary trigger detector which can be used to backdoor an architecture with no human supervision.
We discuss defenses against architectural backdoors, emphasizing the need for robust and comprehensive strategies to safeguard the integrity of ML systems.
arXiv Detail & Related papers (2024-02-10T13:57:51Z) - PatchBackdoor: Backdoor Attack against Deep Neural Networks without
Model Modification [0.0]
Backdoor attack is a major threat to deep learning systems in safety-critical scenarios.
In this paper, we show that backdoor attacks can be achieved without any model modification.
We implement PatchBackdoor in real-world scenarios and show that the attack is still threatening.
arXiv Detail & Related papers (2023-08-22T23:02:06Z) - Rethinking Backdoor Attacks [122.1008188058615]
In a backdoor attack, an adversary inserts maliciously constructed backdoor examples into a training set to make the resulting model vulnerable to manipulation.
Defending against such attacks typically involves viewing these inserted examples as outliers in the training set and using techniques from robust statistics to detect and remove them.
We show that without structural information about the training data distribution, backdoor attacks are indistinguishable from naturally-occurring features in the data.
arXiv Detail & Related papers (2023-07-19T17:44:54Z) - On the Effectiveness of Adversarial Training against Backdoor Attacks [111.8963365326168]
A backdoored model always predicts a target class in the presence of a predefined trigger pattern.
In general, adversarial training is believed to defend against backdoor attacks.
We propose a hybrid strategy which provides satisfactory robustness across different backdoor attacks.
arXiv Detail & Related papers (2022-02-22T02:24:46Z) - Check Your Other Door! Establishing Backdoor Attacks in the Frequency
Domain [80.24811082454367]
We show the advantages of utilizing the frequency domain for establishing undetectable and powerful backdoor attacks.
We also show two possible defences that succeed against frequency-based backdoor attacks and possible ways for the attacker to bypass them.
arXiv Detail & Related papers (2021-09-12T12:44:52Z) - Backdoor Learning: A Survey [75.59571756777342]
Backdoor attack intends to embed hidden backdoor into deep neural networks (DNNs)
Backdoor learning is an emerging and rapidly growing research area.
This paper presents the first comprehensive survey of this realm.
arXiv Detail & Related papers (2020-07-17T04:09:20Z) - Blind Backdoors in Deep Learning Models [22.844973592524966]
We investigate a new method for injecting backdoors into machine learning models, based on compromising the loss-value computation in the model-training code.
We use it to demonstrate new classes of backdoors strictly more powerful than those in the prior literature.
Our attack is blind: the attacker cannot modify the training data, nor observe the execution of his code, nor access the resulting model.
arXiv Detail & Related papers (2020-05-08T02:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.