TROJANZOO: Everything you ever wanted to know about neural backdoors
(but were afraid to ask)
- URL: http://arxiv.org/abs/2012.09302v2
- Date: Tue, 22 Dec 2020 06:38:58 GMT
- Title: TROJANZOO: Everything you ever wanted to know about neural backdoors
(but were afraid to ask)
- Authors: Ren Pang, Zheng Zhang, Xiangshan Gao, Zhaohan Xi, Shouling Ji, Peng
Cheng, Ting Wang
- Abstract summary: TROJANZOO is the first open-source platform for evaluating neural backdoor attacks/defenses.
It has 12 representative attacks, 15 state-of-the-art defenses, 6 attack performance metrics, 10 defense utility metrics, as well as rich tools for analysis of attack-defense interactions.
We conduct a systematic study of existing attacks/defenses, leading to a number of interesting findings.
- Score: 28.785693760449604
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural backdoors represent one primary threat to the security of deep
learning systems. The intensive research on this subject has produced a
plethora of attacks/defenses, resulting in a constant arms race. However, due
to the lack of evaluation benchmarks, many critical questions remain largely
unexplored: (i) How effective, evasive, or transferable are different attacks?
(ii) How robust, utility-preserving, or generic are different defenses? (iii)
How do various factors (e.g., model architectures) impact their performance?
(iv) What are the best practices (e.g., optimization strategies) to operate
such attacks/defenses? (v) How can the existing attacks/defenses be further
improved?
To bridge the gap, we design and implement TROJANZOO, the first open-source
platform for evaluating neural backdoor attacks/defenses in a unified,
holistic, and practical manner. Thus, it has incorporated 12 representative
attacks, 15 state-of-the-art defenses, 6 attack performance metrics, 10 defense
utility metrics, as well as rich tools for in-depth analysis of attack-defense
interactions. Leveraging TROJANZOO, we conduct a systematic study of existing
attacks/defenses, leading to a number of interesting findings: (i) different
attacks manifest various trade-offs among multiple desiderata (e.g.,
effectiveness, evasiveness, and transferability); (ii) one-pixel triggers often
suffice; (iii) optimizing trigger patterns and trojan models jointly improves
both attack effectiveness and evasiveness; (iv) sanitizing trojan models often
introduces new vulnerabilities; (v) most defenses are ineffective against
adaptive attacks, but integrating complementary ones significantly enhances
defense robustness. We envision that such findings will help users select the
right defense solutions and facilitate future research on neural backdoors.
Related papers
- Can Go AIs be adversarially robust? [4.466856575755327]
We study whether adding natural countermeasures can achieve robustness in Go.
We find that though some of these defenses protect against previously discovered attacks, none withstand freshly trained adversaries.
Our results suggest that building robust AI systems is challenging even with extremely superhuman systems in some of the most tractable settings.
arXiv Detail & Related papers (2024-06-18T17:57:49Z) - Versatile Defense Against Adversarial Attacks on Image Recognition [2.9980620769521513]
Defending against adversarial attacks in a real-life setting can be compared to the way antivirus software works.
It appears that a defense method based on image-to-image translation may be capable of this.
The trained model has successfully improved the classification accuracy from nearly zero to an average of 86%.
arXiv Detail & Related papers (2024-03-13T01:48:01Z) - On the Difficulty of Defending Contrastive Learning against Backdoor
Attacks [58.824074124014224]
We show how contrastive backdoor attacks operate through distinctive mechanisms.
Our findings highlight the need for defenses tailored to the specificities of contrastive backdoor attacks.
arXiv Detail & Related papers (2023-12-14T15:54:52Z) - BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive
Learning [85.2564206440109]
This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses.
We introduce the emphtoolns attack, which is resistant to backdoor detection and model fine-tuning defenses.
arXiv Detail & Related papers (2023-11-20T02:21:49Z) - Backdoor Attacks and Countermeasures in Natural Language Processing Models: A Comprehensive Security Review [15.179940846141873]
Applicating third-party data and models has become a new paradigm for language modeling in NLP.
backdoor attacks can induce the model to exhibit expected behaviors through specific triggers.
There is still no systematic and comprehensive review to reflect the security challenges, attacker's capabilities, and purposes.
arXiv Detail & Related papers (2023-09-12T08:48:38Z) - Baseline Defenses for Adversarial Attacks Against Aligned Language
Models [109.75753454188705]
Recent work shows that text moderations can produce jailbreaking prompts that bypass defenses.
We look at three types of defenses: detection (perplexity based), input preprocessing (paraphrase and retokenization), and adversarial training.
We find that the weakness of existing discretes for text, combined with the relatively high costs of optimization, makes standard adaptive attacks more challenging for LLMs.
arXiv Detail & Related papers (2023-09-01T17:59:44Z) - Illusory Attacks: Information-Theoretic Detectability Matters in Adversarial Attacks [76.35478518372692]
We introduce epsilon-illusory, a novel form of adversarial attack on sequential decision-makers.
Compared to existing attacks, we empirically find epsilon-illusory to be significantly harder to detect with automated methods.
Our findings suggest the need for better anomaly detectors, as well as effective hardware- and system-level defenses.
arXiv Detail & Related papers (2022-07-20T19:49:09Z) - Fixed Points in Cyber Space: Rethinking Optimal Evasion Attacks in the
Age of AI-NIDS [70.60975663021952]
We study blackbox adversarial attacks on network classifiers.
We argue that attacker-defender fixed points are themselves general-sum games with complex phase transitions.
We show that a continual learning approach is required to study attacker-defender dynamics.
arXiv Detail & Related papers (2021-11-23T23:42:16Z) - Attack Agnostic Adversarial Defense via Visual Imperceptible Bound [70.72413095698961]
This research aims to design a defense model that is robust within a certain bound against both seen and unseen adversarial attacks.
The proposed defense model is evaluated on the MNIST, CIFAR-10, and Tiny ImageNet databases.
The proposed algorithm is attack agnostic, i.e. it does not require any knowledge of the attack algorithm.
arXiv Detail & Related papers (2020-10-25T23:14:26Z) - Backdoor Attacks and Countermeasures on Deep Learning: A Comprehensive
Review [40.36824357892676]
This work provides the community with a timely comprehensive review of backdoor attacks and countermeasures on deep learning.
According to the attacker's capability and affected stage of the machine learning pipeline, the attack surfaces are recognized to be wide.
Countermeasures are categorized into four general classes: blind backdoor removal, offline backdoor inspection, online backdoor inspection, and post backdoor removal.
arXiv Detail & Related papers (2020-07-21T12:49:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.