Architectural Neural Backdoors from First Principles
- URL: http://arxiv.org/abs/2402.06957v1
- Date: Sat, 10 Feb 2024 13:57:51 GMT
- Title: Architectural Neural Backdoors from First Principles
- Authors: Harry Langford, Ilia Shumailov, Yiren Zhao, Robert Mullins, Nicolas
Papernot
- Abstract summary: architectural backdoors are backdoors embedded within the definition of the network's architecture.
In this work we construct an arbitrary trigger detector which can be used to backdoor an architecture with no human supervision.
We discuss defenses against architectural backdoors, emphasizing the need for robust and comprehensive strategies to safeguard the integrity of ML systems.
- Score: 44.83442736206931
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: While previous research backdoored neural networks by changing their
parameters, recent work uncovered a more insidious threat: backdoors embedded
within the definition of the network's architecture. This involves injecting
common architectural components, such as activation functions and pooling
layers, to subtly introduce a backdoor behavior that persists even after (full
re-)training. However, the full scope and implications of architectural
backdoors have remained largely unexplored. Bober-Irizar et al. [2023]
introduced the first architectural backdoor; they showed how to create a
backdoor for a checkerboard pattern, but never explained how to target an
arbitrary trigger pattern of choice. In this work we construct an arbitrary
trigger detector which can be used to backdoor an architecture with no human
supervision. This leads us to revisit the concept of architecture backdoors and
taxonomise them, describing 12 distinct types. To gauge the difficulty of
detecting such backdoors, we conducted a user study, revealing that ML
developers can only identify suspicious components in common model definitions
as backdoors in 37% of cases, while they surprisingly preferred backdoored
models in 33% of cases. To contextualize these results, we find that language
models outperform humans at the detection of backdoors. Finally, we discuss
defenses against architectural backdoors, emphasizing the need for robust and
comprehensive strategies to safeguard the integrity of ML systems.
Related papers
- Expose Before You Defend: Unifying and Enhancing Backdoor Defenses via Exposed Models [68.40324627475499]
We introduce a novel two-step defense framework named Expose Before You Defend.
EBYD unifies existing backdoor defense methods into a comprehensive defense system with enhanced performance.
We conduct extensive experiments on 10 image attacks and 6 text attacks across 2 vision datasets and 4 language datasets.
arXiv Detail & Related papers (2024-10-25T09:36:04Z) - Flatness-aware Sequential Learning Generates Resilient Backdoors [7.969181278996343]
Recently, backdoor attacks have become an emerging threat to the security of machine learning models.
This paper counters CF of backdoors by leveraging continual learning (CL) techniques.
We propose a novel framework, named Sequential Backdoor Learning (SBL), that can generate resilient backdoors.
arXiv Detail & Related papers (2024-07-20T03:30:05Z) - Injecting Undetectable Backdoors in Obfuscated Neural Networks and Language Models [39.34881774508323]
We investigate the threat posed by undetectable backdoors in ML models developed by external expert firms.
We develop a strategy to plant backdoors to obfuscated neural networks, that satisfy the security properties of the celebrated notion of indistinguishability obfuscation.
Our method to plant backdoors ensures that even if the weights and architecture of the obfuscated model are accessible, the existence of the backdoor is still undetectable.
arXiv Detail & Related papers (2024-06-09T06:26:21Z) - Model Pairing Using Embedding Translation for Backdoor Attack Detection on Open-Set Classification Tasks [63.269788236474234]
We propose to use model pairs on open-set classification tasks for detecting backdoors.
We show that this score, can be an indicator for the presence of a backdoor despite models being of different architectures.
This technique allows for the detection of backdoors on models designed for open-set classification tasks, which is little studied in the literature.
arXiv Detail & Related papers (2024-02-28T21:29:16Z) - BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input
Detection [42.021282816470794]
We present a novel defense, against backdoor attacks on Deep Neural Networks (DNNs)
Our defense falls within the category of post-development defenses that operate independently of how the model was generated.
We show the feasibility of devising highly accurate backdoor input detectors that filter out the backdoor inputs during model inference.
arXiv Detail & Related papers (2023-08-23T21:47:06Z) - An anomaly detection approach for backdoored neural networks: face
recognition as a case study [77.92020418343022]
We propose a novel backdoored network detection method based on the principle of anomaly detection.
We test our method on a novel dataset of backdoored networks and report detectability results with perfect scores.
arXiv Detail & Related papers (2022-08-22T12:14:13Z) - Architectural Backdoors in Neural Networks [27.315196801989032]
We introduce a new class of backdoor attacks that hide inside model architectures.
These backdoors are simple to implement, for instance by publishing open-source code for a backdoored model architecture.
We demonstrate that model architectural backdoors represent a real threat and, unlike other approaches, can survive a complete re-training from scratch.
arXiv Detail & Related papers (2022-06-15T22:44:03Z) - Check Your Other Door! Establishing Backdoor Attacks in the Frequency
Domain [80.24811082454367]
We show the advantages of utilizing the frequency domain for establishing undetectable and powerful backdoor attacks.
We also show two possible defences that succeed against frequency-based backdoor attacks and possible ways for the attacker to bypass them.
arXiv Detail & Related papers (2021-09-12T12:44:52Z) - Backdoor Learning: A Survey [75.59571756777342]
Backdoor attack intends to embed hidden backdoor into deep neural networks (DNNs)
Backdoor learning is an emerging and rapidly growing research area.
This paper presents the first comprehensive survey of this realm.
arXiv Detail & Related papers (2020-07-17T04:09:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.