Planting Undetectable Backdoors in Machine Learning Models
- URL: http://arxiv.org/abs/2204.06974v1
- Date: Thu, 14 Apr 2022 13:55:21 GMT
- Title: Planting Undetectable Backdoors in Machine Learning Models
- Authors: Shafi Goldwasser, Michael P. Kim, Vinod Vaikuntanathan, Or Zamir
- Abstract summary: We show how a malicious learner can plant an undetectable backdoor into a classifier.
Without the appropriate "backdoor key", the mechanism is hidden and cannot be detected by any computationally-bounded observer.
We show two frameworks for planting undetectable backdoors, with incomparable guarantees.
- Score: 17.494133972292403
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Given the computational cost and technical expertise required to train
machine learning models, users may delegate the task of learning to a service
provider. We show how a malicious learner can plant an undetectable backdoor
into a classifier. On the surface, such a backdoored classifier behaves
normally, but in reality, the learner maintains a mechanism for changing the
classification of any input, with only a slight perturbation. Importantly,
without the appropriate "backdoor key", the mechanism is hidden and cannot be
detected by any computationally-bounded observer. We demonstrate two frameworks
for planting undetectable backdoors, with incomparable guarantees.
First, we show how to plant a backdoor in any model, using digital signature
schemes. The construction guarantees that given black-box access to the
original model and the backdoored version, it is computationally infeasible to
find even a single input where they differ. This property implies that the
backdoored model has generalization error comparable with the original model.
Second, we demonstrate how to insert undetectable backdoors in models trained
using the Random Fourier Features (RFF) learning paradigm or in Random ReLU
networks. In this construction, undetectability holds against powerful
white-box distinguishers: given a complete description of the network and the
training data, no efficient distinguisher can guess whether the model is
"clean" or contains a backdoor.
Our construction of undetectable backdoors also sheds light on the related
issue of robustness to adversarial examples. In particular, our construction
can produce a classifier that is indistinguishable from an "adversarially
robust" classifier, but where every input has an adversarial example! In
summary, the existence of undetectable backdoors represent a significant
theoretical roadblock to certifying adversarial robustness.
Related papers
- Model Pairing Using Embedding Translation for Backdoor Attack Detection
on Open-Set Classification Tasks [51.78558228584093]
We propose to use model pairs on open-set classification tasks for detecting backdoors.
We show that backdoors can be detected even when both models are backdoored.
arXiv Detail & Related papers (2024-02-28T21:29:16Z) - BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input
Detection [42.021282816470794]
We present a novel defense, against backdoor attacks on Deep Neural Networks (DNNs)
Our defense falls within the category of post-development defenses that operate independently of how the model was generated.
We show the feasibility of devising highly accurate backdoor input detectors that filter out the backdoor inputs during model inference.
arXiv Detail & Related papers (2023-08-23T21:47:06Z) - Backdoor Learning on Sequence to Sequence Models [94.23904400441957]
In this paper, we study whether sequence-to-sequence (seq2seq) models are vulnerable to backdoor attacks.
Specifically, we find by only injecting 0.2% samples of the dataset, we can cause the seq2seq model to generate the designated keyword and even the whole sentence.
Extensive experiments on machine translation and text summarization have been conducted to show our proposed methods could achieve over 90% attack success rate on multiple datasets and models.
arXiv Detail & Related papers (2023-05-03T20:31:13Z) - Universal Soldier: Using Universal Adversarial Perturbations for
Detecting Backdoor Attacks [15.917794562400449]
A deep learning model may be poisoned by training with backdoored data or by modifying inner network parameters.
It is difficult to distinguish between clean and backdoored models without prior knowledge of the trigger.
We propose a novel method called Universal Soldier for Backdoor detection (USB) and reverse engineering potential backdoor triggers via UAPs.
arXiv Detail & Related papers (2023-02-01T20:47:58Z) - An anomaly detection approach for backdoored neural networks: face
recognition as a case study [77.92020418343022]
We propose a novel backdoored network detection method based on the principle of anomaly detection.
We test our method on a novel dataset of backdoored networks and report detectability results with perfect scores.
arXiv Detail & Related papers (2022-08-22T12:14:13Z) - Architectural Backdoors in Neural Networks [27.315196801989032]
We introduce a new class of backdoor attacks that hide inside model architectures.
These backdoors are simple to implement, for instance by publishing open-source code for a backdoored model architecture.
We demonstrate that model architectural backdoors represent a real threat and, unlike other approaches, can survive a complete re-training from scratch.
arXiv Detail & Related papers (2022-06-15T22:44:03Z) - Check Your Other Door! Establishing Backdoor Attacks in the Frequency
Domain [80.24811082454367]
We show the advantages of utilizing the frequency domain for establishing undetectable and powerful backdoor attacks.
We also show two possible defences that succeed against frequency-based backdoor attacks and possible ways for the attacker to bypass them.
arXiv Detail & Related papers (2021-09-12T12:44:52Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z) - Attack of the Tails: Yes, You Really Can Backdoor Federated Learning [21.06925263586183]
Federated Learning (FL) lends itself to adversarial attacks in the form of backdoors during training.
An edge-case backdoor forces a model to misclassify on seemingly easy inputs that are however unlikely to be part of the training, or test data, i.e., they live on the tail of the input distribution.
We show how these edge-case backdoors can lead to unsavory failures and may have serious repercussions on fairness.
arXiv Detail & Related papers (2020-07-09T21:50:54Z) - Scalable Backdoor Detection in Neural Networks [61.39635364047679]
Deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch.
We propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types.
In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.
arXiv Detail & Related papers (2020-06-10T04:12:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.