Game of Trojans: A Submodular Byzantine Approach
- URL: http://arxiv.org/abs/2207.05937v1
- Date: Wed, 13 Jul 2022 03:12:26 GMT
- Title: Game of Trojans: A Submodular Byzantine Approach
- Authors: Dinuka Sahabandu, Arezoo Rajabi, Luyao Niu, Bo Li, Bhaskar
Ramasubramanian, Radha Poovendran
- Abstract summary: We provide an analytical characterization of adversarial capability and strategic interactions between the adversary and detection mechanism.
We propose a Submodular Trojan algorithm to determine the minimal fraction of samples to inject a Trojan trigger.
We show that the adversary wins the game with probability one, thus bypassing detection.
- Score: 9.512062990461212
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning models in the wild have been shown to be vulnerable to
Trojan attacks during training. Although many detection mechanisms have been
proposed, strong adaptive attackers have been shown to be effective against
them. In this paper, we aim to answer the questions considering an intelligent
and adaptive adversary: (i) What is the minimal amount of instances required to
be Trojaned by a strong attacker? and (ii) Is it possible for such an attacker
to bypass strong detection mechanisms?
We provide an analytical characterization of adversarial capability and
strategic interactions between the adversary and detection mechanism that take
place in such models. We characterize adversary capability in terms of the
fraction of the input dataset that can be embedded with a Trojan trigger. We
show that the loss function has a submodular structure, which leads to the
design of computationally efficient algorithms to determine this fraction with
provable bounds on optimality. We propose a Submodular Trojan algorithm to
determine the minimal fraction of samples to inject a Trojan trigger. To evade
detection of the Trojaned model, we model strategic interactions between the
adversary and Trojan detection mechanism as a two-player game. We show that the
adversary wins the game with probability one, thus bypassing detection. We
establish this by proving that output probability distributions of a Trojan
model and a clean model are identical when following the Min-Max (MM) Trojan
algorithm.
We perform extensive evaluations of our algorithms on MNIST, CIFAR-10, and
EuroSAT datasets. The results show that (i) with Submodular Trojan algorithm,
the adversary needs to embed a Trojan trigger into a very small fraction of
samples to achieve high accuracy on both Trojan and clean samples, and (ii) the
MM Trojan algorithm yields a trained Trojan model that evades detection with
probability 1.
Related papers
- Quarantine: Sparsity Can Uncover the Trojan Attack Trigger for Free [126.15842954405929]
Trojan attacks threaten deep neural networks (DNNs) by poisoning them to behave normally on most samples, yet to produce manipulated results for inputs attached with a trigger.
We propose a novel Trojan network detection regime: first locating a "winning Trojan lottery ticket" which preserves nearly full Trojan information yet only chance-level performance on clean inputs; then recovering the trigger embedded in this already isolated subnetwork.
arXiv Detail & Related papers (2022-05-24T06:33:31Z) - Trojan Horse Training for Breaking Defenses against Backdoor Attacks in
Deep Learning [7.3007220721129364]
ML models that contain a backdoor are called Trojan models.
Current single-target backdoor attacks require one trigger per target class.
We introduce a new, more general attack that will enable a single trigger to result in misclassification to more than one target class.
arXiv Detail & Related papers (2022-03-25T02:54:27Z) - CatchBackdoor: Backdoor Detection via Critical Trojan Neural Path Fuzzing [16.44147178061005]
trojaned behaviors triggered by various trojan attacks can be attributed to the trojan path.
We propose CatchBackdoor, a detection method against trojan attacks.
arXiv Detail & Related papers (2021-12-24T13:57:03Z) - A Synergetic Attack against Neural Network Classifiers combining
Backdoor and Adversarial Examples [11.534521802321976]
We show how to jointly exploit adversarial perturbation and model poisoning vulnerabilities to practically launch a new stealthy attack, dubbed AdvTrojan.
AdvTrojan is stealthy because it can be activated only when: 1) a carefully crafted adversarial perturbation is injected into the input examples during inference, and 2) a Trojan backdoor is implanted during the training process of the model.
arXiv Detail & Related papers (2021-09-03T02:18:57Z) - Practical Detection of Trojan Neural Networks: Data-Limited and
Data-Free Cases [87.69818690239627]
We study the problem of the Trojan network (TrojanNet) detection in the data-scarce regime.
We propose a data-limited TrojanNet detector (TND), when only a few data samples are available for TrojanNet detection.
In addition, we propose a data-free TND, which can detect a TrojanNet without accessing any data samples.
arXiv Detail & Related papers (2020-07-31T02:00:38Z) - Cassandra: Detecting Trojaned Networks from Adversarial Perturbations [92.43879594465422]
In many cases, pre-trained models are sourced from vendors who may have disrupted the training pipeline to insert Trojan behaviors into the models.
We propose a method to verify if a pre-trained model is Trojaned or benign.
Our method captures fingerprints of neural networks in the form of adversarial perturbations learned from the network gradients.
arXiv Detail & Related papers (2020-07-28T19:00:40Z) - Odyssey: Creation, Analysis and Detection of Trojan Models [91.13959405645959]
Trojan attacks interfere with the training pipeline by inserting triggers into some of the training samples and trains the model to act maliciously only for samples that contain the trigger.
Existing Trojan detectors make strong assumptions about the types of triggers and attacks.
We propose a detector that is based on the analysis of the intrinsic properties; that are affected due to the Trojaning process.
arXiv Detail & Related papers (2020-07-16T06:55:00Z) - An Embarrassingly Simple Approach for Trojan Attack in Deep Neural
Networks [59.42357806777537]
trojan attack aims to attack deployed deep neural networks (DNNs) relying on hidden trigger patterns inserted by hackers.
We propose a training-free attack approach which is different from previous work, in which trojaned behaviors are injected by retraining model on a poisoned dataset.
The proposed TrojanNet has several nice properties including (1) it activates by tiny trigger patterns and keeps silent for other signals, (2) it is model-agnostic and could be injected into most DNNs, dramatically expanding its attack scenarios, and (3) the training-free mechanism saves massive training efforts compared to conventional trojan attack methods.
arXiv Detail & Related papers (2020-06-15T04:58:28Z) - Scalable Backdoor Detection in Neural Networks [61.39635364047679]
Deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch.
We propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types.
In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.
arXiv Detail & Related papers (2020-06-10T04:12:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.