Backdoor Secrets Unveiled: Identifying Backdoor Data with Optimized Scaled Prediction Consistency
- URL: http://arxiv.org/abs/2403.10717v1
- Date: Fri, 15 Mar 2024 22:35:07 GMT
- Title: Backdoor Secrets Unveiled: Identifying Backdoor Data with Optimized Scaled Prediction Consistency
- Authors: Soumyadeep Pal, Yuguang Yao, Ren Wang, Bingquan Shen, Sijia Liu,
- Abstract summary: Modern machine learning systems are vulnerable to backdoor poisoning attacks.
We propose an automatic identification of backdoor data within a poisoned dataset.
We show that our approach often surpasses the performance of current baselines in identifying backdoor data points.
- Score: 15.643978173846095
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern machine learning (ML) systems demand substantial training data, often resorting to external sources. Nevertheless, this practice renders them vulnerable to backdoor poisoning attacks. Prior backdoor defense strategies have primarily focused on the identification of backdoored models or poisoned data characteristics, typically operating under the assumption of access to clean data. In this work, we delve into a relatively underexplored challenge: the automatic identification of backdoor data within a poisoned dataset, all under realistic conditions, i.e., without the need for additional clean data or without manually defining a threshold for backdoor detection. We draw an inspiration from the scaled prediction consistency (SPC) technique, which exploits the prediction invariance of poisoned data to an input scaling factor. Based on this, we pose the backdoor data identification problem as a hierarchical data splitting optimization problem, leveraging a novel SPC-based loss function as the primary optimization objective. Our innovation unfolds in several key aspects. First, we revisit the vanilla SPC method, unveiling its limitations in addressing the proposed backdoor identification problem. Subsequently, we develop a bi-level optimization-based approach to precisely identify backdoor data by minimizing the advanced SPC loss. Finally, we demonstrate the efficacy of our proposal against a spectrum of backdoor attacks, encompassing basic label-corrupted attacks as well as more sophisticated clean-label attacks, evaluated across various benchmark datasets. Experiment results show that our approach often surpasses the performance of current baselines in identifying backdoor data points, resulting in about 4%-36% improvement in average AUROC. Codes are available at https://github.com/OPTML-Group/BackdoorMSPC.
Related papers
- Long-Tailed Backdoor Attack Using Dynamic Data Augmentation Operations [50.1394620328318]
Existing backdoor attacks mainly focus on balanced datasets.
We propose an effective backdoor attack named Dynamic Data Augmentation Operation (D$2$AO)
Our method can achieve the state-of-the-art attack performance while preserving the clean accuracy.
arXiv Detail & Related papers (2024-10-16T18:44:22Z) - DeCE: Deceptive Cross-Entropy Loss Designed for Defending Backdoor Attacks [26.24490960002264]
We propose a general and effective loss function DeCE (Deceptive Cross-Entropy) to enhance the security of Code Language Models.
Our experiments across various code synthesis datasets, models, and poisoning ratios demonstrate the applicability and effectiveness of DeCE.
arXiv Detail & Related papers (2024-07-12T03:18:38Z) - Revisiting Backdoor Attacks against Large Vision-Language Models [76.42014292255944]
This paper empirically examines the generalizability of backdoor attacks during the instruction tuning of LVLMs.
We modify existing backdoor attacks based on the above key observations.
This paper underscores that even simple traditional backdoor strategies pose a serious threat to LVLMs.
arXiv Detail & Related papers (2024-06-27T02:31:03Z) - Setting the Trap: Capturing and Defeating Backdoors in Pretrained
Language Models through Honeypots [68.84056762301329]
Recent research has exposed the susceptibility of pretrained language models (PLMs) to backdoor attacks.
We propose and integrate a honeypot module into the original PLM to absorb backdoor information exclusively.
Our design is motivated by the observation that lower-layer representations in PLMs carry sufficient backdoor features.
arXiv Detail & Related papers (2023-10-28T08:21:16Z) - An anomaly detection approach for backdoored neural networks: face
recognition as a case study [77.92020418343022]
We propose a novel backdoored network detection method based on the principle of anomaly detection.
We test our method on a novel dataset of backdoored networks and report detectability results with perfect scores.
arXiv Detail & Related papers (2022-08-22T12:14:13Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z) - DeepSweep: An Evaluation Framework for Mitigating DNN Backdoor Attacks
using Data Augmentation [16.455154794893055]
Third-party providers can inject poisoned samples into datasets or embed backdoors in Deep Learning models.
We propose a systematic approach to discover the optimal policies for defending against different backdoor attacks.
Our identified policy can effectively mitigate eight different kinds of backdoor attacks and outperform five existing defense methods.
arXiv Detail & Related papers (2020-12-13T08:51:37Z) - Backdoor Learning: A Survey [75.59571756777342]
Backdoor attack intends to embed hidden backdoor into deep neural networks (DNNs)
Backdoor learning is an emerging and rapidly growing research area.
This paper presents the first comprehensive survey of this realm.
arXiv Detail & Related papers (2020-07-17T04:09:20Z) - Mitigating backdoor attacks in LSTM-based Text Classification Systems by
Backdoor Keyword Identification [0.0]
In text classification systems, backdoors inserted in the models can cause spam or malicious speech to escape detection.
In this paper, through analyzing the changes in inner LSTM neurons, we proposed a defense method called Backdoor Keyword Identification (BKI) to mitigate backdoor attacks.
We evaluate our method on four different text classification datset: IMDB, DBpedia, 20 newsgroups and Reuters-21578 dataset.
arXiv Detail & Related papers (2020-07-11T09:05:16Z) - Systematic Evaluation of Backdoor Data Poisoning Attacks on Image
Classifiers [6.352532169433872]
Backdoor data poisoning attacks have been demonstrated in computer vision research as a potential safety risk for machine learning (ML) systems.
Our work builds upon prior backdoor data-poisoning research for ML image classifiers.
We find that poisoned models are hard to detect through performance inspection alone.
arXiv Detail & Related papers (2020-04-24T02:58:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.