CBD: A Certified Backdoor Detector Based on Local Dominant Probability
- URL: http://arxiv.org/abs/2310.17498v2
- Date: Thu, 4 Jan 2024 03:11:14 GMT
- Title: CBD: A Certified Backdoor Detector Based on Local Dominant Probability
- Authors: Zhen Xiang and Zidi Xiong and Bo Li
- Abstract summary: We present the first certified backdoor detector (CBD) based on a novel, adjustable conformal prediction scheme.
CBD provides 1) a detection inference, 2) the condition under which the attacks are guaranteed to be detectable, and 3) a probabilistic upper bound for the false positive rate.
CBD achieves comparable or even higher detection accuracy than state-of-the-art detectors, and it in addition provides detection certification.
- Score: 16.8197731929139
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Backdoor attack is a common threat to deep neural networks. During testing,
samples embedded with a backdoor trigger will be misclassified as an
adversarial target by a backdoored model, while samples without the backdoor
trigger will be correctly classified. In this paper, we present the first
certified backdoor detector (CBD), which is based on a novel, adjustable
conformal prediction scheme based on our proposed statistic local dominant
probability. For any classifier under inspection, CBD provides 1) a detection
inference, 2) the condition under which the attacks are guaranteed to be
detectable for the same classification domain, and 3) a probabilistic upper
bound for the false positive rate. Our theoretical results show that attacks
with triggers that are more resilient to test-time noise and have smaller
perturbation magnitudes are more likely to be detected with guarantees.
Moreover, we conduct extensive experiments on four benchmark datasets
considering various backdoor types, such as BadNet, CB, and Blend. CBD achieves
comparable or even higher detection accuracy than state-of-the-art detectors,
and it in addition provides detection certification. Notably, for backdoor
attacks with random perturbation triggers bounded by $\ell_2\leq0.75$ which
achieves more than 90\% attack success rate, CBD achieves 100\% (98\%), 100\%
(84\%), 98\% (98\%), and 72\% (40\%) empirical (certified) detection true
positive rates on the four benchmark datasets GTSRB, SVHN, CIFAR-10, and
TinyImageNet, respectively, with low false positive rates.
Related papers
- RPP: A Certified Poisoned-Sample Detection Framework for Backdoor Attacks under Dataset Imbalance [28.685060872545247]
Deep neural networks are highly susceptible to backdoor attacks.<n>Most defense methods to date rely on balanced data, overlooking the pervasive class imbalance in real-world scenarios.<n>This paper presents the first in-depth investigation of how the dataset imbalance amplifies backdoor vulnerability.
arXiv Detail & Related papers (2026-01-30T05:17:54Z) - VulAgent: Hypothesis-Validation based Multi-Agent Vulnerability Detection [55.957275374847484]
VulAgent is a multi-agent vulnerability detection framework based on hypothesis validation.<n>It implements a semantics-sensitive, multi-view detection pipeline, each aligned to a specific analysis perspective.<n>On average, VulAgent improves overall accuracy by 6.6%, increases the correct identification rate of vulnerable--fixed code pairs by up to 450%, and reduces the false positive rate by about 36%.
arXiv Detail & Related papers (2025-09-15T02:25:38Z) - Lie Detector: Unified Backdoor Detection via Cross-Examination Framework [68.45399098884364]
We propose a unified backdoor detection framework in the semi-honest setting.
Our method achieves superior detection performance, improving accuracy by 5.4%, 1.6%, and 11.9% over SoTA baselines.
Notably, it is the first to effectively detect backdoors in multimodal large language models.
arXiv Detail & Related papers (2025-03-21T06:12:06Z) - AdvQDet: Detecting Query-Based Adversarial Attacks with Adversarial Contrastive Prompt Tuning [93.77763753231338]
Adversarial Contrastive Prompt Tuning (ACPT) is proposed to fine-tune the CLIP image encoder to extract similar embeddings for any two intermediate adversarial queries.
We show that ACPT can detect 7 state-of-the-art query-based attacks with $>99%$ detection rate within 5 shots.
We also show that ACPT is robust to 3 types of adaptive attacks.
arXiv Detail & Related papers (2024-08-04T09:53:50Z) - BoBa: Boosting Backdoor Detection through Data Distribution Inference in Federated Learning [26.714674251814586]
Federated learning is susceptible to poisoning attacks due to its decentralized nature.
We propose a novel distribution-aware anomaly detection mechanism, BoBa, to address this problem.
arXiv Detail & Related papers (2024-07-12T19:38:42Z) - T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models [70.03122709795122]
We propose a comprehensive defense method named T2IShield to detect, localize, and mitigate backdoor attacks.
We find the "Assimilation Phenomenon" on the cross-attention maps caused by the backdoor trigger.
For backdoor sample detection, T2IShield achieves a detection F1 score of 88.9$%$ with low computational cost.
arXiv Detail & Related papers (2024-07-05T01:53:21Z) - CBPF: Filtering Poisoned Data Based on Composite Backdoor Attack [11.815603563125654]
This paper explores strategies for mitigating the risks associated with backdoor attacks by examining the filtration of poisoned samples.
A novel three-stage poisoning data filtering approach, known as Composite Backdoor Poison Filtering (CBPF), is proposed as an effective solution.
arXiv Detail & Related papers (2024-06-23T14:37:24Z) - PSBD: Prediction Shift Uncertainty Unlocks Backdoor Detection [57.571451139201855]
Prediction Shift Backdoor Detection (PSBD) is a novel method for identifying backdoor samples in deep neural networks.
PSBD is motivated by an intriguing Prediction Shift (PS) phenomenon, where poisoned models' predictions on clean data often shift away from true labels towards certain other labels.
PSBD identifies backdoor training samples by computing the Prediction Shift Uncertainty (PSU), the variance in probability values when dropout layers are toggled on and off during model inference.
arXiv Detail & Related papers (2024-06-09T15:31:00Z) - IBD-PSC: Input-level Backdoor Detection via Parameter-oriented Scaling Consistency [20.61046457594186]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
This paper proposes a simple yet effective input-level backdoor detection (dubbed IBD-PSC) to filter out malicious testing images.
arXiv Detail & Related papers (2024-05-16T03:19:52Z) - Conservative Prediction via Data-Driven Confidence Minimization [70.93946578046003]
In safety-critical applications of machine learning, it is often desirable for a model to be conservative.
We propose the Data-Driven Confidence Minimization framework, which minimizes confidence on an uncertainty dataset.
arXiv Detail & Related papers (2023-06-08T07:05:36Z) - Backdoor Learning on Sequence to Sequence Models [94.23904400441957]
In this paper, we study whether sequence-to-sequence (seq2seq) models are vulnerable to backdoor attacks.
Specifically, we find by only injecting 0.2% samples of the dataset, we can cause the seq2seq model to generate the designated keyword and even the whole sentence.
Extensive experiments on machine translation and text summarization have been conducted to show our proposed methods could achieve over 90% attack success rate on multiple datasets and models.
arXiv Detail & Related papers (2023-05-03T20:31:13Z) - Detecting Backdoors During the Inference Stage Based on Corruption
Robustness Consistency [33.42013309686333]
We propose a test-time trigger sample detection method that only needs the hard-label outputs of the victim models without any extra information.
Our journey begins with the intriguing observation that the backdoor-infected models have similar performance across different image corruptions for the clean images, but perform discrepantly for the trigger samples.
Extensive experiments demonstrate that compared with state-of-the-art defenses, TeCo outperforms them on different backdoor attacks, datasets, and model architectures.
arXiv Detail & Related papers (2023-03-27T07:10:37Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z) - Non-Intrusive Detection of Adversarial Deep Learning Attacks via
Observer Networks [5.4572790062292125]
Recent studies have shown that deep learning models are vulnerable to crafted adversarial inputs.
We propose a novel method to detect adversarial inputs by augmenting the main classification network with multiple binary detectors.
We achieve a 99.5% detection accuracy on the MNIST dataset and 97.5% on the CIFAR-10 dataset.
arXiv Detail & Related papers (2020-02-22T21:13:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.