ProP: Efficient Backdoor Detection via Propagation Perturbation for Overparametrized Models
- URL: http://arxiv.org/abs/2411.07036v1
- Date: Mon, 11 Nov 2024 14:43:44 GMT
- Title: ProP: Efficient Backdoor Detection via Propagation Perturbation for Overparametrized Models
- Authors: Tao Ren, Qiongxiu Li,
- Abstract summary: Backdoor attacks pose significant challenges to the security of machine learning models.
We propose ProP, a novel and scalable backdoor detection method.
ProP operates with minimal assumptions, requiring no prior knowledge of triggers or malicious samples.
- Score: 2.808880709778591
- License:
- Abstract: Backdoor attacks pose significant challenges to the security of machine learning models, particularly for overparameterized models like deep neural networks. In this paper, we propose ProP (Propagation Perturbation), a novel and scalable backdoor detection method that leverages statistical output distributions to identify backdoored models and their target classes without relying on exhausive optimization strategies. ProP introduces a new metric, the benign score, to quantify output distributions and effectively distinguish between benign and backdoored models. Unlike existing approaches, ProP operates with minimal assumptions, requiring no prior knowledge of triggers or malicious samples, making it highly applicable to real-world scenarios. Extensive experimental validation across multiple popular backdoor attacks demonstrates that ProP achieves high detection accuracy and computational efficiency, outperforming existing methods. These results highlight ProP's potential as a robust and practical solution for backdoor detection.
Related papers
- Pseudo-Probability Unlearning: Towards Efficient and Privacy-Preserving Machine Unlearning [59.29849532966454]
We propose PseudoProbability Unlearning (PPU), a novel method that enables models to forget data to adhere to privacy-preserving manner.
Our method achieves over 20% improvements in forgetting error compared to the state-of-the-art.
arXiv Detail & Related papers (2024-11-04T21:27:06Z) - PromptFix: Few-shot Backdoor Removal via Adversarial Prompt Tuning [28.845915332201592]
Pre-trained language models (PLMs) have attracted enormous attention over the past few years with their unparalleled performances.
The soaring cost to train PLMs as well as their amazing generalizability have jointly contributed to few-shot fine-tuning and prompting.
Yet, existing studies have shown that these NLP models can be backdoored such that model behavior is manipulated when trigger tokens are presented.
We propose PromptFix, a novel backdoor mitigation strategy for NLP models via adversarial prompt-tuning in few-shot settings.
arXiv Detail & Related papers (2024-06-06T20:06:42Z) - IBD-PSC: Input-level Backdoor Detection via Parameter-oriented Scaling Consistency [20.61046457594186]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
This paper proposes a simple yet effective input-level backdoor detection (dubbed IBD-PSC) to filter out malicious testing images.
arXiv Detail & Related papers (2024-05-16T03:19:52Z) - Setting the Trap: Capturing and Defeating Backdoors in Pretrained
Language Models through Honeypots [68.84056762301329]
Recent research has exposed the susceptibility of pretrained language models (PLMs) to backdoor attacks.
We propose and integrate a honeypot module into the original PLM to absorb backdoor information exclusively.
Our design is motivated by the observation that lower-layer representations in PLMs carry sufficient backdoor features.
arXiv Detail & Related papers (2023-10-28T08:21:16Z) - LMSanitator: Defending Prompt-Tuning Against Task-Agnostic Backdoors [10.136109501389168]
LMSanitator is a novel approach for detecting and removing task-agnostic backdoors on Transformer models.
LMSanitator achieves 92.8% backdoor detection accuracy on 960 models and decreases the attack success rate to less than 1% in most scenarios.
arXiv Detail & Related papers (2023-08-26T15:21:47Z) - Provably Efficient UCB-type Algorithms For Learning Predictive State
Representations [55.00359893021461]
The sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs)
This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models.
In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational tractability, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.
arXiv Detail & Related papers (2023-07-01T18:35:21Z) - Adaptive Perturbation Generation for Multiple Backdoors Detection [29.01715186371785]
This paper proposes the Adaptive Perturbation Generation (APG) framework to detect multiple types of backdoor attacks.
We first design the global-to-local strategy to fit the multiple types of backdoor triggers.
To further increase the efficiency of perturbation injection, we introduce a gradient-guided mask generation strategy.
arXiv Detail & Related papers (2022-09-12T13:37:06Z) - Backdoor Pre-trained Models Can Transfer to All [33.720258110911274]
We propose a new approach to map the inputs containing triggers directly to a predefined output representation of pre-trained NLP models.
In light of the unique properties of triggers in NLP, we propose two new metrics to measure the performance of backdoor attacks.
arXiv Detail & Related papers (2021-10-30T07:11:24Z) - CC-Cert: A Probabilistic Approach to Certify General Robustness of
Neural Networks [58.29502185344086]
In safety-critical machine learning applications, it is crucial to defend models against adversarial attacks.
It is important to provide provable guarantees for deep learning models against semantically meaningful input transformations.
We propose a new universal probabilistic certification approach based on Chernoff-Cramer bounds.
arXiv Detail & Related papers (2021-09-22T12:46:04Z) - Covert Model Poisoning Against Federated Learning: Algorithm Design and
Optimization [76.51980153902774]
Federated learning (FL) is vulnerable to external attacks on FL models during parameters transmissions.
In this paper, we propose effective MP algorithms to combat state-of-the-art defensive aggregation mechanisms.
Our experimental results demonstrate that the proposed CMP algorithms are effective and substantially outperform existing attack mechanisms.
arXiv Detail & Related papers (2021-01-28T03:28:18Z) - Scalable Backdoor Detection in Neural Networks [61.39635364047679]
Deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch.
We propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types.
In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.
arXiv Detail & Related papers (2020-06-10T04:12:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.