Injecting Undetectable Backdoors in Obfuscated Neural Networks and Language Models
- URL: http://arxiv.org/abs/2406.05660v2
- Date: Sat, 7 Sep 2024 13:50:34 GMT
- Title: Injecting Undetectable Backdoors in Obfuscated Neural Networks and Language Models
- Authors: Alkis Kalavasis, Amin Karbasi, Argyris Oikonomou, Katerina Sotiraki, Grigoris Velegkas, Manolis Zampetakis,
- Abstract summary: We investigate the threat posed by undetectable backdoors in ML models developed by external expert firms.
We develop a strategy to plant backdoors to obfuscated neural networks, that satisfy the security properties of the celebrated notion of indistinguishability obfuscation.
Our method to plant backdoors ensures that even if the weights and architecture of the obfuscated model are accessible, the existence of the backdoor is still undetectable.
- Score: 39.34881774508323
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As ML models become increasingly complex and integral to high-stakes domains such as finance and healthcare, they also become more susceptible to sophisticated adversarial attacks. We investigate the threat posed by undetectable backdoors, as defined in Goldwasser et al. (FOCS '22), in models developed by insidious external expert firms. When such backdoors exist, they allow the designer of the model to sell information on how to slightly perturb their input to change the outcome of the model. We develop a general strategy to plant backdoors to obfuscated neural networks, that satisfy the security properties of the celebrated notion of indistinguishability obfuscation. Applying obfuscation before releasing neural networks is a strategy that is well motivated to protect sensitive information of the external expert firm. Our method to plant backdoors ensures that even if the weights and architecture of the obfuscated model are accessible, the existence of the backdoor is still undetectable. Finally, we introduce the notion of undetectable backdoors to language models and extend our neural network backdoor attacks to such models based on the existence of steganographic functions.
Related papers
- Rethinking Backdoor Detection Evaluation for Language Models [45.34806299803778]
Backdoor attacks pose a major security risk for practitioners who depend on publicly released language models.
Backdoor detection methods aim to detect whether a released model contains a backdoor, so that practitioners can avoid such vulnerabilities.
While existing backdoor detection methods have high accuracy in detecting backdoored models on standard benchmarks, it is unclear whether they can robustly identify backdoors in the wild.
arXiv Detail & Related papers (2024-08-31T09:19:39Z) - Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits [1.1118610055902116]
We introduce a novel class of backdoors in autoregressive transformer models, that, in contrast to prior art, are unelicitable in nature.
Unelicitability prevents the defender from triggering the backdoor, making it impossible to evaluate or detect ahead of deployment.
We show that our novel construction is not only unelicitable thanks to using cryptographic techniques, but also has favourable robustness properties.
arXiv Detail & Related papers (2024-06-03T17:55:41Z) - Architectural Neural Backdoors from First Principles [44.83442736206931]
architectural backdoors are backdoors embedded within the definition of the network's architecture.
In this work we construct an arbitrary trigger detector which can be used to backdoor an architecture with no human supervision.
We discuss defenses against architectural backdoors, emphasizing the need for robust and comprehensive strategies to safeguard the integrity of ML systems.
arXiv Detail & Related papers (2024-02-10T13:57:51Z) - BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input
Detection [42.021282816470794]
We present a novel defense, against backdoor attacks on Deep Neural Networks (DNNs)
Our defense falls within the category of post-development defenses that operate independently of how the model was generated.
We show the feasibility of devising highly accurate backdoor input detectors that filter out the backdoor inputs during model inference.
arXiv Detail & Related papers (2023-08-23T21:47:06Z) - Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics.
We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z) - An anomaly detection approach for backdoored neural networks: face
recognition as a case study [77.92020418343022]
We propose a novel backdoored network detection method based on the principle of anomaly detection.
We test our method on a novel dataset of backdoored networks and report detectability results with perfect scores.
arXiv Detail & Related papers (2022-08-22T12:14:13Z) - Check Your Other Door! Establishing Backdoor Attacks in the Frequency
Domain [80.24811082454367]
We show the advantages of utilizing the frequency domain for establishing undetectable and powerful backdoor attacks.
We also show two possible defences that succeed against frequency-based backdoor attacks and possible ways for the attacker to bypass them.
arXiv Detail & Related papers (2021-09-12T12:44:52Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z) - Backdoor Learning: A Survey [75.59571756777342]
Backdoor attack intends to embed hidden backdoor into deep neural networks (DNNs)
Backdoor learning is an emerging and rapidly growing research area.
This paper presents the first comprehensive survey of this realm.
arXiv Detail & Related papers (2020-07-17T04:09:20Z) - Backdoors in Neural Models of Source Code [13.960152426268769]
We study backdoors in the context of deep-learning for source code.
We show how to poison a dataset to install such backdoors.
We also show the ease of injecting backdoors and our ability to eliminate them.
arXiv Detail & Related papers (2020-06-11T21:35:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.