Model Pairing Using Embedding Translation for Backdoor Attack Detection
on Open-Set Classification Tasks
- URL: http://arxiv.org/abs/2402.18718v1
- Date: Wed, 28 Feb 2024 21:29:16 GMT
- Title: Model Pairing Using Embedding Translation for Backdoor Attack Detection
on Open-Set Classification Tasks
- Authors: Alexander Unnervik, Hatef Otroshi Shahreza, Anjith George, S\'ebastien
Marcel
- Abstract summary: We propose to use model pairs on open-set classification tasks for detecting backdoors.
We show that backdoors can be detected even when both models are backdoored.
- Score: 51.78558228584093
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Backdoor attacks allow an attacker to embed a specific vulnerability in a
machine learning algorithm, activated when an attacker-chosen pattern is
presented, causing a specific misprediction. The need to identify backdoors in
biometric scenarios has led us to propose a novel technique with different
trade-offs. In this paper we propose to use model pairs on open-set
classification tasks for detecting backdoors. Using a simple linear operation
to project embeddings from a probe model's embedding space to a reference
model's embedding space, we can compare both embeddings and compute a
similarity score. We show that this score, can be an indicator for the presence
of a backdoor despite models being of different architectures, having been
trained independently and on different datasets. Additionally, we show that
backdoors can be detected even when both models are backdoored. The source code
is made available for reproducibility purposes.
Related papers
- Backdoor Learning on Sequence to Sequence Models [94.23904400441957]
In this paper, we study whether sequence-to-sequence (seq2seq) models are vulnerable to backdoor attacks.
Specifically, we find by only injecting 0.2% samples of the dataset, we can cause the seq2seq model to generate the designated keyword and even the whole sentence.
Extensive experiments on machine translation and text summarization have been conducted to show our proposed methods could achieve over 90% attack success rate on multiple datasets and models.
arXiv Detail & Related papers (2023-05-03T20:31:13Z) - Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics.
We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z) - Detecting Backdoors in Deep Text Classifiers [43.36440869257781]
We present the first robust defence mechanism that generalizes to several backdoor attacks against text classification models.
Our technique is highly accurate at defending against state-of-the-art backdoor attacks, including data poisoning and weight poisoning.
arXiv Detail & Related papers (2022-10-11T07:48:03Z) - An anomaly detection approach for backdoored neural networks: face
recognition as a case study [77.92020418343022]
We propose a novel backdoored network detection method based on the principle of anomaly detection.
We test our method on a novel dataset of backdoored networks and report detectability results with perfect scores.
arXiv Detail & Related papers (2022-08-22T12:14:13Z) - Architectural Backdoors in Neural Networks [27.315196801989032]
We introduce a new class of backdoor attacks that hide inside model architectures.
These backdoors are simple to implement, for instance by publishing open-source code for a backdoored model architecture.
We demonstrate that model architectural backdoors represent a real threat and, unlike other approaches, can survive a complete re-training from scratch.
arXiv Detail & Related papers (2022-06-15T22:44:03Z) - MM-BD: Post-Training Detection of Backdoor Attacks with Arbitrary
Backdoor Pattern Types Using a Maximum Margin Statistic [27.62279831135902]
We propose a post-training defense that detects backdoor attacks with arbitrary types of backdoor embeddings.
Our detector does not need any legitimate clean samples, and can efficiently detect backdoor attacks with arbitrary numbers of source classes.
arXiv Detail & Related papers (2022-05-13T21:32:24Z) - Planting Undetectable Backdoors in Machine Learning Models [17.494133972292403]
We show how a malicious learner can plant an undetectable backdoor into a classifier.
Without the appropriate "backdoor key", the mechanism is hidden and cannot be detected by any computationally-bounded observer.
We show two frameworks for planting undetectable backdoors, with incomparable guarantees.
arXiv Detail & Related papers (2022-04-14T13:55:21Z) - Check Your Other Door! Establishing Backdoor Attacks in the Frequency
Domain [80.24811082454367]
We show the advantages of utilizing the frequency domain for establishing undetectable and powerful backdoor attacks.
We also show two possible defences that succeed against frequency-based backdoor attacks and possible ways for the attacker to bypass them.
arXiv Detail & Related papers (2021-09-12T12:44:52Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.