Related papers: Poisoned classifiers are not only backdoored, they are fundamentally broken

Poisoned classifiers are not only backdoored, they are fundamentally broken

URL: http://arxiv.org/abs/2010.09080v2
Date: Tue, 5 Oct 2021 09:45:30 GMT
Title: Poisoned classifiers are not only backdoored, they are fundamentally broken
Authors: Mingjie Sun, Siddhant Agarwal, J. Zico Kolter
Abstract summary: Under a commonly-studied backdoor poisoning attack against classification models, an attacker adds a small trigger to a subset of the training data. It is often assumed that the poisoned classifier is vulnerable exclusively to the adversary who possesses the trigger. In this paper, we show empirically that this view of backdoored classifiers is incorrect.
Score: 84.67778403778442
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Under a commonly-studied backdoor poisoning attack against classification models, an attacker adds a small trigger to a subset of the training data, such that the presence of this trigger at test time causes the classifier to always predict some target class. It is often implicitly assumed that the poisoned classifier is vulnerable exclusively to the adversary who possesses the trigger. In this paper, we show empirically that this view of backdoored classifiers is incorrect. We describe a new threat model for poisoned classifier, where one without knowledge of the original trigger, would want to control the poisoned classifier. Under this threat model, we propose a test-time, human-in-the-loop attack method to generate multiple effective alternative triggers without access to the initial backdoor and the training data. We construct these alternative triggers by first generating adversarial examples for a smoothed version of the classifier, created with a procedure called Denoised Smoothing, and then extracting colors or cropped portions of smoothed adversarial images with human interaction. We demonstrate the effectiveness of our attack through extensive experiments on high-resolution datasets: ImageNet and TrojAI. We also compare our approach to previous work on modeling trigger distributions and find that our method are more scalable and efficient in generating effective triggers. Last, we include a user study which demonstrates that our method allows users to easily determine the existence of such backdoors in existing poisoned classifiers. Thus, we argue that there is no such thing as a secret backdoor in poisoned classifiers: poisoning a classifier invites attacks not just by the party that possesses the trigger, but from anyone with access to the classifier.

Related papers

Backdooring Outlier Detection Methods: A Novel Attack Approach [2.19238269573727]
Outlier detection is crucial for deploying classifiers in critical real-world applications. We propose BATOD, a novel Backdoor Attack targeting the Outlier Detection task.
arXiv Detail & Related papers (2024-12-06T13:03:22Z)
SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks [53.28390057407576]
Modern NLP models are often trained on public datasets drawn from diverse sources. Data poisoning attacks can manipulate the model's behavior in ways engineered by the attacker. Several strategies have been proposed to mitigate the risks associated with backdoor attacks.
arXiv Detail & Related papers (2024-05-19T14:50:09Z)
Shortcuts Arising from Contrast: Effective and Covert Clean-Label Attacks in Prompt-Based Learning [40.130762098868736]
We propose a method named Contrastive Shortcut Injection (CSI), by leveraging activation values, integrates trigger design and data selection strategies to craft stronger shortcut features. With extensive experiments on full-shot and few-shot text classification tasks, we empirically validate CSI's high effectiveness and high stealthiness at low poisoning rates.
arXiv Detail & Related papers (2024-03-30T20:02:36Z)
Poisoned Forgery Face: Towards Backdoor Attacks on Face Forgery Detection [62.595450266262645]
This paper introduces a novel and previously unrecognized threat in face forgery detection scenarios caused by backdoor attack. By embedding backdoors into models, attackers can deceive detectors into producing erroneous predictions for forged faces. We propose emphPoisoned Forgery Face framework, which enables clean-label backdoor attacks on face forgery detectors.
arXiv Detail & Related papers (2024-02-18T06:31:05Z)
Improved Activation Clipping for Universal Backdoor Mitigation and Test-Time Detection [27.62279831135902]
Deep neural networks are vulnerable toTrojan attacks, where an attacker poisons the training set with backdoor triggers. Recent work shows that backdoor poisoning induces over-fitting (abnormally large activations) in the attacked model. We devise a new such approach, choosing the activation bounds to explicitly limit classification margins.
arXiv Detail & Related papers (2023-08-08T22:47:39Z)
Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks. backdoor attack is an emerging yet threatening training-phase threat. We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z)
WeDef: Weakly Supervised Backdoor Defense for Text Classification [48.19967241668793]
Existing backdoor defense methods are only effective for limited trigger types. We propose a novel weakly supervised backdoor defense framework WeDef. We show that WeDef is effective against popular trigger-based attacks.
arXiv Detail & Related papers (2022-05-24T05:53:11Z)
BFClass: A Backdoor-free Text Classification Framework [21.762274809679692]
We propose BFClass, a novel efficient backdoor-free training framework for text classification. The backbone of BFClass is a pre-trained discriminator that predicts whether each token in the corrupted input was replaced by a masked language model. Extensive experiments demonstrate that BFClass can identify all the triggers, remove 95% poisoned training samples with very limited false alarms, and achieve almost the same performance as the models trained on the benign training data.
arXiv Detail & Related papers (2021-09-22T17:28:21Z)
Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data. We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level. Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.