A Synergetic Attack against Neural Network Classifiers combining
Backdoor and Adversarial Examples
- URL: http://arxiv.org/abs/2109.01275v1
- Date: Fri, 3 Sep 2021 02:18:57 GMT
- Title: A Synergetic Attack against Neural Network Classifiers combining
Backdoor and Adversarial Examples
- Authors: Guanxiong Liu, Issa Khalil, Abdallah Khreishah, NhatHai Phan
- Abstract summary: We show how to jointly exploit adversarial perturbation and model poisoning vulnerabilities to practically launch a new stealthy attack, dubbed AdvTrojan.
AdvTrojan is stealthy because it can be activated only when: 1) a carefully crafted adversarial perturbation is injected into the input examples during inference, and 2) a Trojan backdoor is implanted during the training process of the model.
- Score: 11.534521802321976
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we show how to jointly exploit adversarial perturbation and
model poisoning vulnerabilities to practically launch a new stealthy attack,
dubbed AdvTrojan. AdvTrojan is stealthy because it can be activated only when:
1) a carefully crafted adversarial perturbation is injected into the input
examples during inference, and 2) a Trojan backdoor is implanted during the
training process of the model. We leverage adversarial noise in the input space
to move Trojan-infected examples across the model decision boundary, making it
difficult to detect. The stealthiness behavior of AdvTrojan fools the users
into accidentally trust the infected model as a robust classifier against
adversarial examples. AdvTrojan can be implemented by only poisoning the
training data similar to conventional Trojan backdoor attacks. Our thorough
analysis and extensive experiments on several benchmark datasets show that
AdvTrojan can bypass existing defenses with a success rate close to 100% in
most of our experimental scenarios and can be extended to attack federated
learning tasks as well.
Related papers
- Rethinking Backdoor Attacks [122.1008188058615]
In a backdoor attack, an adversary inserts maliciously constructed backdoor examples into a training set to make the resulting model vulnerable to manipulation.
Defending against such attacks typically involves viewing these inserted examples as outliers in the training set and using techniques from robust statistics to detect and remove them.
We show that without structural information about the training data distribution, backdoor attacks are indistinguishable from naturally-occurring features in the data.
arXiv Detail & Related papers (2023-07-19T17:44:54Z) - Instance-Level Trojan Attacks on Visual Question Answering via Adversarial Learning in Neuron Activation Space [11.93979764176335]
Trojan attacks embed in input data leading to malicious behavior in neural network models.
We propose an instance-level multimodal Trojan attack on VQA that efficiently adapts to fine-tuned models.
We demonstrate that the proposed attack can be efficiently adapted to different fine-tuned models, by injecting only a few shots of Trojan samples.
arXiv Detail & Related papers (2023-04-02T03:03:21Z) - FreeEagle: Detecting Complex Neural Trojans in Data-Free Cases [50.065022493142116]
Trojan attack on deep neural networks, also known as backdoor attack, is a typical threat to artificial intelligence.
FreeEagle is the first data-free backdoor detection method that can effectively detect complex backdoor attacks.
arXiv Detail & Related papers (2023-02-28T11:31:29Z) - An Adaptive Black-box Defense against Trojan Attacks (TrojDef) [5.880596125802611]
Trojan backdoor is a poisoning attack against Neural Network (NN) classifiers.
We propose a more practical black-box defense, dubbed TrojDef, which can only run forward-pass of the NN.
TrojDef significantly outperforms the-state-of-the-art defenses and is highly stable under different settings.
arXiv Detail & Related papers (2022-09-05T01:54:44Z) - CatchBackdoor: Backdoor Detection via Critical Trojan Neural Path Fuzzing [16.44147178061005]
trojaned behaviors triggered by various trojan attacks can be attributed to the trojan path.
We propose CatchBackdoor, a detection method against trojan attacks.
arXiv Detail & Related papers (2021-12-24T13:57:03Z) - Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger [48.59965356276387]
We propose to use syntactic structure as the trigger in textual backdoor attacks.
We conduct extensive experiments to demonstrate that the trigger-based attack method can achieve comparable attack performance.
These results also reveal the significant insidiousness and harmfulness of textual backdoor attacks.
arXiv Detail & Related papers (2021-05-26T08:54:19Z) - Practical Detection of Trojan Neural Networks: Data-Limited and
Data-Free Cases [87.69818690239627]
We study the problem of the Trojan network (TrojanNet) detection in the data-scarce regime.
We propose a data-limited TrojanNet detector (TND), when only a few data samples are available for TrojanNet detection.
In addition, we propose a data-free TND, which can detect a TrojanNet without accessing any data samples.
arXiv Detail & Related papers (2020-07-31T02:00:38Z) - Odyssey: Creation, Analysis and Detection of Trojan Models [91.13959405645959]
Trojan attacks interfere with the training pipeline by inserting triggers into some of the training samples and trains the model to act maliciously only for samples that contain the trigger.
Existing Trojan detectors make strong assumptions about the types of triggers and attacks.
We propose a detector that is based on the analysis of the intrinsic properties; that are affected due to the Trojaning process.
arXiv Detail & Related papers (2020-07-16T06:55:00Z) - An Embarrassingly Simple Approach for Trojan Attack in Deep Neural
Networks [59.42357806777537]
trojan attack aims to attack deployed deep neural networks (DNNs) relying on hidden trigger patterns inserted by hackers.
We propose a training-free attack approach which is different from previous work, in which trojaned behaviors are injected by retraining model on a poisoned dataset.
The proposed TrojanNet has several nice properties including (1) it activates by tiny trigger patterns and keeps silent for other signals, (2) it is model-agnostic and could be injected into most DNNs, dramatically expanding its attack scenarios, and (3) the training-free mechanism saves massive training efforts compared to conventional trojan attack methods.
arXiv Detail & Related papers (2020-06-15T04:58:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.