Related papers: Architectural Neural Backdoors from First Principles

Architectural Neural Backdoors from First Principles

URL: http://arxiv.org/abs/2402.06957v1
Date: Sat, 10 Feb 2024 13:57:51 GMT
Title: Architectural Neural Backdoors from First Principles
Authors: Harry Langford, Ilia Shumailov, Yiren Zhao, Robert Mullins, Nicolas Papernot
Abstract summary: architectural backdoors are backdoors embedded within the definition of the network's architecture. In this work we construct an arbitrary trigger detector which can be used to backdoor an architecture with no human supervision. We discuss defenses against architectural backdoors, emphasizing the need for robust and comprehensive strategies to safeguard the integrity of ML systems.
Score: 44.83442736206931
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: While previous research backdoored neural networks by changing their parameters, recent work uncovered a more insidious threat: backdoors embedded within the definition of the network's architecture. This involves injecting common architectural components, such as activation functions and pooling layers, to subtly introduce a backdoor behavior that persists even after (full re-)training. However, the full scope and implications of architectural backdoors have remained largely unexplored. Bober-Irizar et al. [2023] introduced the first architectural backdoor; they showed how to create a backdoor for a checkerboard pattern, but never explained how to target an arbitrary trigger pattern of choice. In this work we construct an arbitrary trigger detector which can be used to backdoor an architecture with no human supervision. This leads us to revisit the concept of architecture backdoors and taxonomise them, describing 12 distinct types. To gauge the difficulty of detecting such backdoors, we conducted a user study, revealing that ML developers can only identify suspicious components in common model definitions as backdoors in 37% of cases, while they surprisingly preferred backdoored models in 33% of cases. To contextualize these results, we find that language models outperform humans at the detection of backdoors. Finally, we discuss defenses against architectural backdoors, emphasizing the need for robust and comprehensive strategies to safeguard the integrity of ML systems.

Related papers

Expose Before You Defend: Unifying and Enhancing Backdoor Defenses via Exposed Models [68.40324627475499]
We introduce a novel two-step defense framework named Expose Before You Defend. EBYD unifies existing backdoor defense methods into a comprehensive defense system with enhanced performance. We conduct extensive experiments on 10 image attacks and 6 text attacks across 2 vision datasets and 4 language datasets.
arXiv Detail & Related papers (2024-10-25T09:36:04Z)
Flatness-aware Sequential Learning Generates Resilient Backdoors [7.969181278996343]
Recently, backdoor attacks have become an emerging threat to the security of machine learning models. This paper counters CF of backdoors by leveraging continual learning (CL) techniques. We propose a novel framework, named Sequential Backdoor Learning (SBL), that can generate resilient backdoors.
arXiv Detail & Related papers (2024-07-20T03:30:05Z)
Injecting Undetectable Backdoors in Obfuscated Neural Networks and Language Models [39.34881774508323]
We investigate the threat posed by undetectable backdoors in ML models developed by external expert firms. We develop a strategy to plant backdoors to obfuscated neural networks, that satisfy the security properties of the celebrated notion of indistinguishability obfuscation. Our method to plant backdoors ensures that even if the weights and architecture of the obfuscated model are accessible, the existence of the backdoor is still undetectable.
arXiv Detail & Related papers (2024-06-09T06:26:21Z)
Model Pairing Using Embedding Translation for Backdoor Attack Detection on Open-Set Classification Tasks [63.269788236474234]
We propose to use model pairs on open-set classification tasks for detecting backdoors. We show that this score, can be an indicator for the presence of a backdoor despite models being of different architectures. This technique allows for the detection of backdoors on models designed for open-set classification tasks, which is little studied in the literature.
arXiv Detail & Related papers (2024-02-28T21:29:16Z)
BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection [42.021282816470794]
We present a novel defense, against backdoor attacks on Deep Neural Networks (DNNs) Our defense falls within the category of post-development defenses that operate independently of how the model was generated. We show the feasibility of devising highly accurate backdoor input detectors that filter out the backdoor inputs during model inference.
arXiv Detail & Related papers (2023-08-23T21:47:06Z)
An anomaly detection approach for backdoored neural networks: face recognition as a case study [77.92020418343022]
We propose a novel backdoored network detection method based on the principle of anomaly detection. We test our method on a novel dataset of backdoored networks and report detectability results with perfect scores.
arXiv Detail & Related papers (2022-08-22T12:14:13Z)
Architectural Backdoors in Neural Networks [27.315196801989032]
We introduce a new class of backdoor attacks that hide inside model architectures. These backdoors are simple to implement, for instance by publishing open-source code for a backdoored model architecture. We demonstrate that model architectural backdoors represent a real threat and, unlike other approaches, can survive a complete re-training from scratch.
arXiv Detail & Related papers (2022-06-15T22:44:03Z)
Check Your Other Door! Establishing Backdoor Attacks in the Frequency Domain [80.24811082454367]
We show the advantages of utilizing the frequency domain for establishing undetectable and powerful backdoor attacks. We also show two possible defences that succeed against frequency-based backdoor attacks and possible ways for the attacker to bypass them.
arXiv Detail & Related papers (2021-09-12T12:44:52Z)
Backdoor Learning: A Survey [75.59571756777342]
Backdoor attack intends to embed hidden backdoor into deep neural networks (DNNs) Backdoor learning is an emerging and rapidly growing research area. This paper presents the first comprehensive survey of this realm.
arXiv Detail & Related papers (2020-07-17T04:09:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.