Contributor-Aware Defenses Against Adversarial Backdoor Attacks
- URL: http://arxiv.org/abs/2206.03583v1
- Date: Sat, 28 May 2022 20:25:34 GMT
- Title: Contributor-Aware Defenses Against Adversarial Backdoor Attacks
- Authors: Glenn Dawson, Muhammad Umer, Robi Polikar
- Abstract summary: adversarial backdoor attacks have demonstrated the capability to perform targeted misclassification of specific examples.
We propose a contributor-aware universal defensive framework for learning in the presence of multiple, potentially adversarial data sources.
Our empirical studies demonstrate the robustness of the proposed framework against adversarial backdoor attacks from multiple simultaneous adversaries.
- Score: 2.830541450812474
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep neural networks for image classification are well-known to be vulnerable
to adversarial attacks. One such attack that has garnered recent attention is
the adversarial backdoor attack, which has demonstrated the capability to
perform targeted misclassification of specific examples. In particular,
backdoor attacks attempt to force a model to learn spurious relations between
backdoor trigger patterns and false labels. In response to this threat,
numerous defensive measures have been proposed; however, defenses against
backdoor attacks focus on backdoor pattern detection, which may be unreliable
against novel or unexpected types of backdoor pattern designs. We introduce a
novel re-contextualization of the adversarial setting, where the presence of an
adversary implicitly admits the existence of multiple database contributors.
Then, under the mild assumption of contributor awareness, it becomes possible
to exploit this knowledge to defend against backdoor attacks by destroying the
false label associations. We propose a contributor-aware universal defensive
framework for learning in the presence of multiple, potentially adversarial
data sources that utilizes semi-supervised ensembles and learning from crowds
to filter the false labels produced by adversarial triggers. Importantly, this
defensive strategy is agnostic to backdoor pattern design, as it functions
without needing -- or even attempting -- to perform either adversary
identification or backdoor pattern detection during either training or
inference. Our empirical studies demonstrate the robustness of the proposed
framework against adversarial backdoor attacks from multiple simultaneous
adversaries.
Related papers
- Breaking the False Sense of Security in Backdoor Defense through Re-Activation Attack [32.74007523929888]
We re-investigate the characteristics of backdoored models after defense.
We find that the original backdoors still exist in defense models derived from existing post-training defense strategies.
We empirically show that these dormant backdoors can be easily re-activated during inference.
arXiv Detail & Related papers (2024-05-25T08:57:30Z) - SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks [53.28390057407576]
Modern NLP models are often trained on public datasets drawn from diverse sources.
Data poisoning attacks can manipulate the model's behavior in ways engineered by the attacker.
Several strategies have been proposed to mitigate the risks associated with backdoor attacks.
arXiv Detail & Related papers (2024-05-19T14:50:09Z) - On the Difficulty of Defending Contrastive Learning against Backdoor
Attacks [58.824074124014224]
We show how contrastive backdoor attacks operate through distinctive mechanisms.
Our findings highlight the need for defenses tailored to the specificities of contrastive backdoor attacks.
arXiv Detail & Related papers (2023-12-14T15:54:52Z) - BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive
Learning [85.2564206440109]
This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses.
We introduce the emphtoolns attack, which is resistant to backdoor detection and model fine-tuning defenses.
arXiv Detail & Related papers (2023-11-20T02:21:49Z) - Rethinking Backdoor Attacks [122.1008188058615]
In a backdoor attack, an adversary inserts maliciously constructed backdoor examples into a training set to make the resulting model vulnerable to manipulation.
Defending against such attacks typically involves viewing these inserted examples as outliers in the training set and using techniques from robust statistics to detect and remove them.
We show that without structural information about the training data distribution, backdoor attacks are indistinguishable from naturally-occurring features in the data.
arXiv Detail & Related papers (2023-07-19T17:44:54Z) - Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics.
We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z) - Detecting Backdoors in Deep Text Classifiers [43.36440869257781]
We present the first robust defence mechanism that generalizes to several backdoor attacks against text classification models.
Our technique is highly accurate at defending against state-of-the-art backdoor attacks, including data poisoning and weight poisoning.
arXiv Detail & Related papers (2022-10-11T07:48:03Z) - On the Effectiveness of Adversarial Training against Backdoor Attacks [111.8963365326168]
A backdoored model always predicts a target class in the presence of a predefined trigger pattern.
In general, adversarial training is believed to defend against backdoor attacks.
We propose a hybrid strategy which provides satisfactory robustness across different backdoor attacks.
arXiv Detail & Related papers (2022-02-22T02:24:46Z) - Can We Mitigate Backdoor Attack Using Adversarial Detection Methods? [26.8404758315088]
We conduct comprehensive studies on the connections between adversarial examples and backdoor examples of Deep Neural Networks.
Our insights are based on the observation that both adversarial examples and backdoor examples have anomalies during the inference process.
We revise four existing adversarial defense methods for detecting backdoor examples.
arXiv Detail & Related papers (2020-06-26T09:09:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.