Exposing Backdoors in Robust Machine Learning Models
- URL: http://arxiv.org/abs/2003.00865v3
- Date: Thu, 3 Jun 2021 07:02:14 GMT
- Title: Exposing Backdoors in Robust Machine Learning Models
- Authors: Ezekiel Soremekun, Sakshi Udeshi and Sudipta Chattopadhyay
- Abstract summary: We show that adversarially robust models are susceptible to backdoor attacks.
backdoors are reflected in the feature representation of such models.
This observation is leveraged to detect backdoor-infected models via a detection technique called AEGIS.
- Score: 0.5672132510411463
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The introduction of robust optimisation has pushed the state-of-the-art in
defending against adversarial attacks. However, the behaviour of such
optimisation has not been studied in the light of a fundamentally different
class of attacks called backdoors. In this paper, we demonstrate that
adversarially robust models are susceptible to backdoor attacks. Subsequently,
we observe that backdoors are reflected in the feature representation of such
models. Then, this observation is leveraged to detect backdoor-infected models
via a detection technique called AEGIS. Specifically, AEGIS uses feature
clustering to effectively detect backdoor-infected robust Deep Neural Networks
(DNNs). In our evaluation of several visible and hidden backdoor triggers on
major classification tasks using CIFAR-10, MNIST and FMNIST datasets, AEGIS
effectively detects robust DNNs infected with backdoors. AEGIS detects a
backdoor-infected model with 91.6% accuracy, without any false positives.
Furthermore, AEGIS detects the targeted class in the backdoor-infected model
with a reasonably low (11.1%) false positive rate. Our investigation reveals
that salient features of adversarially robust DNNs break the stealthy nature of
backdoor attacks.
Related papers
- Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats [52.94388672185062]
We propose an efficient defense mechanism against backdoor threats using a concept known as machine unlearning.
This entails strategically creating a small set of poisoned samples to aid the model's rapid unlearning of backdoor vulnerabilities.
In the backdoor unlearning process, we present a novel token-based portion unlearning training regime.
arXiv Detail & Related papers (2024-09-29T02:55:38Z) - Towards Robust Object Detection: Identifying and Removing Backdoors via Module Inconsistency Analysis [5.8634235309501435]
We propose a backdoor defense framework tailored to object detection models.
By quantifying and analyzing inconsistencies, we develop an algorithm to detect backdoors.
Experiments with state-of-the-art two-stage object detectors show our method achieves a 90% improvement in backdoor removal rate.
arXiv Detail & Related papers (2024-09-24T12:58:35Z) - Securing GNNs: Explanation-Based Identification of Backdoored Training Graphs [13.93535590008316]
Graph Neural Networks (GNNs) have gained popularity in numerous domains, yet they are vulnerable to backdoor attacks that can compromise their performance and ethical application.
We present a novel method to detect backdoor attacks in GNNs.
Our results show that our method can achieve high detection performance, marking a significant advancement in safeguarding GNNs against backdoor attacks.
arXiv Detail & Related papers (2024-03-26T22:41:41Z) - Backdoor Defense via Deconfounded Representation Learning [17.28760299048368]
We propose a Causality-inspired Backdoor Defense (CBD) to learn deconfounded representations for reliable classification.
CBD is effective in reducing backdoor threats while maintaining high accuracy in predicting benign samples.
arXiv Detail & Related papers (2023-03-13T02:25:59Z) - FreeEagle: Detecting Complex Neural Trojans in Data-Free Cases [50.065022493142116]
Trojan attack on deep neural networks, also known as backdoor attack, is a typical threat to artificial intelligence.
FreeEagle is the first data-free backdoor detection method that can effectively detect complex backdoor attacks.
arXiv Detail & Related papers (2023-02-28T11:31:29Z) - Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics.
We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z) - Backdoor Defense via Suppressing Model Shortcuts [91.30995749139012]
In this paper, we explore the backdoor mechanism from the angle of the model structure.
We demonstrate that the attack success rate (ASR) decreases significantly when reducing the outputs of some key skip connections.
arXiv Detail & Related papers (2022-11-02T15:39:19Z) - Invisible Backdoor Attacks Using Data Poisoning in the Frequency Domain [8.64369418938889]
We propose a generalized backdoor attack method based on the frequency domain.
It can implement backdoor implantation without mislabeling and accessing the training process.
We evaluate our approach in the no-label and clean-label cases on three datasets.
arXiv Detail & Related papers (2022-07-09T07:05:53Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z) - Backdoor Learning: A Survey [75.59571756777342]
Backdoor attack intends to embed hidden backdoor into deep neural networks (DNNs)
Backdoor learning is an emerging and rapidly growing research area.
This paper presents the first comprehensive survey of this realm.
arXiv Detail & Related papers (2020-07-17T04:09:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.