Model X-ray:Detect Backdoored Models via Decision Boundary
- URL: http://arxiv.org/abs/2402.17465v1
- Date: Tue, 27 Feb 2024 12:42:07 GMT
- Title: Model X-ray:Detect Backdoored Models via Decision Boundary
- Authors: Yanghao Su, Jie Zhang, Ting Xu, Tianwei Zhang, Weiming Zhang, Nenghai
Yu
- Abstract summary: Deep neural networks (DNNs) have revolutionized various industries, leading to the rise of Machine Learning as a Service (ML)
DNNs are susceptible to backdoor attacks, which pose significant risks to their applications.
We propose Model X-ray, a novel backdoor detection approach for ML through the analysis of decision boundaries.
- Score: 66.41173675107886
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks (DNNs) have revolutionized various industries, leading
to the rise of Machine Learning as a Service (MLaaS). In this paradigm,
well-trained models are typically deployed through APIs. However, DNNs are
susceptible to backdoor attacks, which pose significant risks to their
applications. This vulnerability necessitates a method for users to ascertain
whether an API is compromised before usage. Although many backdoor detection
methods have been developed, they often operate under the assumption that the
defender has access to specific information such as details of the attack, soft
predictions from the model API, and even the knowledge of the model parameters,
limiting their practicality in MLaaS scenarios. To address it, in this paper,
we begin by presenting an intriguing observation: the decision boundary of the
backdoored model exhibits a greater degree of closeness than that of the clean
model. Simultaneously, if only one single label is infected, a larger portion
of the regions will be dominated by the attacked label. Building upon this
observation, we propose Model X-ray, a novel backdoor detection approach for
MLaaS through the analysis of decision boundaries. Model X-ray can not only
identify whether the target API is infected by backdoor attacks but also
determine the target attacked label under the all-to-one attack strategy.
Importantly, it accomplishes this solely by the hard prediction of clean
inputs, regardless of any assumptions about attacks and prior knowledge of the
training details of the model. Extensive experiments demonstrated that Model
X-ray can be effective for MLaaS across diverse backdoor attacks, datasets, and
architectures.
Related papers
- Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models [112.48136829374741]
In this paper, we unveil a new vulnerability: the privacy backdoor attack.
When a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model.
Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
arXiv Detail & Related papers (2024-04-01T16:50:54Z) - UFID: A Unified Framework for Input-level Backdoor Detection on Diffusion Models [19.46962670935554]
Diffusion Models are vulnerable to backdoor attacks.
malicious attackers inject backdoors by poisoning some parts of the training samples.
This poses a serious threat to the downstream users, who query the diffusion models through the API or directly download them from the internet.
arXiv Detail & Related papers (2024-04-01T13:21:05Z) - Model Pairing Using Embedding Translation for Backdoor Attack Detection
on Open-Set Classification Tasks [51.78558228584093]
We propose to use model pairs on open-set classification tasks for detecting backdoors.
We show that backdoors can be detected even when both models are backdoored.
arXiv Detail & Related papers (2024-02-28T21:29:16Z) - OCGEC: One-class Graph Embedding Classification for DNN Backdoor Detection [18.11795712499763]
This study proposes a novel one-class classification framework called One-class Graph Embedding Classification (OCGEC)
OCGEC uses GNNs for model-level backdoor detection with only a little amount of clean data.
In comparison to other baselines, it achieves AUC scores of more than 98% on a number of tasks.
arXiv Detail & Related papers (2023-12-04T02:48:40Z) - Backdoor Defense via Deconfounded Representation Learning [17.28760299048368]
We propose a Causality-inspired Backdoor Defense (CBD) to learn deconfounded representations for reliable classification.
CBD is effective in reducing backdoor threats while maintaining high accuracy in predicting benign samples.
arXiv Detail & Related papers (2023-03-13T02:25:59Z) - Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics.
We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z) - DeepSight: Mitigating Backdoor Attacks in Federated Learning Through
Deep Model Inspection [26.593268413299228]
Federated Learning (FL) allows multiple clients to collaboratively train a Neural Network (NN) model on their private data without revealing the data.
DeepSight is a novel model filtering approach for mitigating backdoor attacks.
We show that it can mitigate state-of-the-art backdoor attacks with a negligible impact on the model's performance on benign data.
arXiv Detail & Related papers (2022-01-03T17:10:07Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z) - Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data.
We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level.
Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.