Differential Analysis of Triggers and Benign Features for Black-Box DNN
Backdoor Detection
- URL: http://arxiv.org/abs/2307.05422v2
- Date: Fri, 14 Jul 2023 18:22:31 GMT
- Title: Differential Analysis of Triggers and Benign Features for Black-Box DNN
Backdoor Detection
- Authors: Hao Fu, Prashanth Krishnamurthy, Siddharth Garg, Farshad Khorrami
- Abstract summary: This paper proposes a data-efficient detection method for deep neural networks against backdoor attacks under a black-box scenario.
To measure the effects of triggers and benign features on determining the backdoored network output, we introduce five metrics.
We show the efficacy of our methodology through a broad range of backdoor attacks, including ablation studies and comparison to existing approaches.
- Score: 18.481370450591317
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes a data-efficient detection method for deep neural
networks against backdoor attacks under a black-box scenario. The proposed
approach is motivated by the intuition that features corresponding to triggers
have a higher influence in determining the backdoored network output than any
other benign features. To quantitatively measure the effects of triggers and
benign features on determining the backdoored network output, we introduce five
metrics. To calculate the five-metric values for a given input, we first
generate several synthetic samples by injecting the input's partial contents
into clean validation samples. Then, the five metrics are computed by using the
output labels of the corresponding synthetic samples. One contribution of this
work is the use of a tiny clean validation dataset. Having the computed five
metrics, five novelty detectors are trained from the validation dataset. A meta
novelty detector fuses the output of the five trained novelty detectors to
generate a meta confidence score. During online testing, our method determines
if online samples are poisoned or not via assessing their meta confidence
scores output by the meta novelty detector. We show the efficacy of our
methodology through a broad range of backdoor attacks, including ablation
studies and comparison to existing approaches. Our methodology is promising
since the proposed five metrics quantify the inherent differences between clean
and poisoned samples. Additionally, our detection method can be incrementally
improved by appending more metrics that may be proposed to address future
advanced attacks.
Related papers
- Unlearnable Examples Detection via Iterative Filtering [84.59070204221366]
Deep neural networks are proven to be vulnerable to data poisoning attacks.
It is quite beneficial and challenging to detect poisoned samples from a mixed dataset.
We propose an Iterative Filtering approach for UEs identification.
arXiv Detail & Related papers (2024-08-15T13:26:13Z) - PSBD: Prediction Shift Uncertainty Unlocks Backdoor Detection [57.571451139201855]
Prediction Shift Backdoor Detection (PSBD) is a novel method for identifying backdoor samples in deep neural networks.
PSBD is motivated by an intriguing Prediction Shift (PS) phenomenon, where poisoned models' predictions on clean data often shift away from true labels towards certain other labels.
PSBD identifies backdoor training samples by computing the Prediction Shift Uncertainty (PSU), the variance in probability values when dropout layers are toggled on and off during model inference.
arXiv Detail & Related papers (2024-06-09T15:31:00Z) - Lazy Layers to Make Fine-Tuned Diffusion Models More Traceable [70.77600345240867]
A novel arbitrary-in-arbitrary-out (AIAO) strategy makes watermarks resilient to fine-tuning-based removal.
Unlike the existing methods of designing a backdoor for the input/output space of diffusion models, in our method, we propose to embed the backdoor into the feature space of sampled subpaths.
Our empirical studies on the MS-COCO, AFHQ, LSUN, CUB-200, and DreamBooth datasets confirm the robustness of AIAO.
arXiv Detail & Related papers (2024-05-01T12:03:39Z) - DataElixir: Purifying Poisoned Dataset to Mitigate Backdoor Attacks via
Diffusion Models [12.42597979026873]
We propose DataElixir, a novel sanitization approach tailored to purify poisoned datasets.
We leverage diffusion models to eliminate trigger features and restore benign features, thereby turning the poisoned samples into benign ones.
Experiments conducted on 9 popular attacks demonstrates that DataElixir effectively mitigates various complex attacks while exerting minimal impact on benign accuracy.
arXiv Detail & Related papers (2023-12-18T09:40:38Z) - Anomaly Detection with Ensemble of Encoder and Decoder [2.8199078343161266]
Anomaly detection in power grids aims to detect and discriminate anomalies caused by cyber attacks against the power system.
We propose a novel anomaly detection method by modeling the data distribution of normal samples via multiple encoders and decoders.
Experiment results on network intrusion and power system datasets demonstrate the effectiveness of our proposed method.
arXiv Detail & Related papers (2023-03-11T15:49:29Z) - Penalizing Proposals using Classifiers for Semi-Supervised Object
Detection [2.8522223112994833]
We propose a modified loss function to train on large silver standard annotated sets generated by a weak annotator.
We include a confidence metric associated with the annotation as an additional term in the loss function, signifying the quality of the annotation.
In comparison with the baseline where no confidence metric is used, we achieved a 4% gain in mAP with 25% labeled data and 10% gain in mAP with 50% labeled data.
arXiv Detail & Related papers (2022-05-26T08:30:48Z) - PiDAn: A Coherence Optimization Approach for Backdoor Attack Detection
and Mitigation in Deep Neural Networks [22.900501880865658]
Backdoor attacks impose a new threat in Deep Neural Networks (DNNs)
We propose PiDAn, an algorithm based on coherence optimization purifying the poisoned data.
Our PiDAn algorithm can detect more than 90% infected classes and identify 95% poisoned samples.
arXiv Detail & Related papers (2022-03-17T12:37:21Z) - Spotting adversarial samples for speaker verification by neural vocoders [102.1486475058963]
We adopt neural vocoders to spot adversarial samples for automatic speaker verification (ASV)
We find that the difference between the ASV scores for the original and re-synthesize audio is a good indicator for discrimination between genuine and adversarial samples.
Our codes will be made open-source for future works to do comparison.
arXiv Detail & Related papers (2021-07-01T08:58:16Z) - Detecting Backdoors in Neural Networks Using Novel Feature-Based Anomaly
Detection [16.010654200489913]
This paper proposes a new defense against neural network backdooring attacks.
It is based on the intuition that the feature extraction layers of a backdoored network embed new features to detect the presence of a trigger.
To detect backdoors, the proposed defense uses two synergistic anomaly detectors trained on clean validation data.
arXiv Detail & Related papers (2020-11-04T20:33:51Z) - Learning a Unified Sample Weighting Network for Object Detection [113.98404690619982]
Region sampling or weighting is significantly important to the success of modern region-based object detectors.
We argue that sample weighting should be data-dependent and task-dependent.
We propose a unified sample weighting network to predict a sample's task weights.
arXiv Detail & Related papers (2020-06-11T16:19:16Z) - Scalable Backdoor Detection in Neural Networks [61.39635364047679]
Deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch.
We propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types.
In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.
arXiv Detail & Related papers (2020-06-10T04:12:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.