Game of Trojans: Adaptive Adversaries Against Output-based
Trojaned-Model Detectors
- URL: http://arxiv.org/abs/2402.08695v1
- Date: Mon, 12 Feb 2024 20:14:46 GMT
- Title: Game of Trojans: Adaptive Adversaries Against Output-based
Trojaned-Model Detectors
- Authors: Dinuka Sahabandu, Xiaojun Xu, Arezoo Rajabi, Luyao Niu, Bhaskar
Ramasubramanian, Bo Li, Radha Poovendran
- Abstract summary: We analyze an adaptive adversary that can retrain a Trojaned DNN and is also aware of SOTA output-based Trojaned model detectors.
We show that such an adversary can ensure (1) high accuracy on both trigger-embedded and clean samples and (2) bypass detection.
- Score: 11.825974900783844
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose and analyze an adaptive adversary that can retrain a Trojaned DNN
and is also aware of SOTA output-based Trojaned model detectors. We show that
such an adversary can ensure (1) high accuracy on both trigger-embedded and
clean samples and (2) bypass detection. Our approach is based on an observation
that the high dimensionality of the DNN parameters provides sufficient degrees
of freedom to simultaneously achieve these objectives. We also enable SOTA
detectors to be adaptive by allowing retraining to recalibrate their
parameters, thus modeling a co-evolution of parameters of a Trojaned model and
detectors. We then show that this co-evolution can be modeled as an iterative
game, and prove that the resulting (optimal) solution of this interactive game
leads to the adversary successfully achieving the above objectives. In
addition, we provide a greedy algorithm for the adversary to select a minimum
number of input samples for embedding triggers. We show that for cross-entropy
or log-likelihood loss functions used by the DNNs, the greedy algorithm
provides provable guarantees on the needed number of trigger-embedded input
samples. Extensive experiments on four diverse datasets -- MNIST, CIFAR-10,
CIFAR-100, and SpeechCommand -- reveal that the adversary effectively evades
four SOTA output-based Trojaned model detectors: MNTD, NeuralCleanse, STRIP,
and TABOR.
Related papers
- Lazy Layers to Make Fine-Tuned Diffusion Models More Traceable [70.77600345240867]
A novel arbitrary-in-arbitrary-out (AIAO) strategy makes watermarks resilient to fine-tuning-based removal.
Unlike the existing methods of designing a backdoor for the input/output space of diffusion models, in our method, we propose to embed the backdoor into the feature space of sampled subpaths.
Our empirical studies on the MS-COCO, AFHQ, LSUN, CUB-200, and DreamBooth datasets confirm the robustness of AIAO.
arXiv Detail & Related papers (2024-05-01T12:03:39Z) - TEN-GUARD: Tensor Decomposition for Backdoor Attack Detection in Deep
Neural Networks [3.489779105594534]
We introduce a novel approach to backdoor detection using two tensor decomposition methods applied to network activations.
This has a number of advantages relative to existing detection methods, including the ability to analyze multiple models at the same time.
Results show that our method detects backdoored networks more accurately and efficiently than current state-of-the-art methods.
arXiv Detail & Related papers (2024-01-06T03:08:28Z) - PerD: Perturbation Sensitivity-based Neural Trojan Detection Framework
on NLP Applications [21.854581570954075]
Trojan attacks embed the backdoor into the victim and is activated by the trigger in the input space.
We propose a model-level Trojan detection framework by analyzing the deviation of the model output when we introduce a specially crafted perturbation to the input.
We demonstrate the effectiveness of our proposed method on both a dataset of NLP models we create and a public dataset of Trojaned NLP models from TrojAI.
arXiv Detail & Related papers (2022-08-08T22:50:03Z) - An Adaptive Black-box Backdoor Detection Method for Deep Neural Networks [25.593824693347113]
Deep Neural Networks (DNNs) have demonstrated unprecedented performance across various fields such as medical diagnosis and autonomous driving.
They are identified to be vulnerable to Neural Trojan (NT) attacks that are controlled and activated by stealthy triggers.
We propose a robust and adaptive Trojan detection scheme that inspects whether a pre-trained model has been Trojaned before its deployment.
arXiv Detail & Related papers (2022-04-08T23:41:19Z) - The KFIoU Loss for Rotated Object Detection [115.334070064346]
In this paper, we argue that one effective alternative is to devise an approximate loss who can achieve trend-level alignment with SkewIoU loss.
Specifically, we model the objects as Gaussian distribution and adopt Kalman filter to inherently mimic the mechanism of SkewIoU.
The resulting new loss called KFIoU is easier to implement and works better compared with exact SkewIoU.
arXiv Detail & Related papers (2022-01-29T10:54:57Z) - Towards Adversarial Patch Analysis and Certified Defense against Crowd
Counting [61.99564267735242]
Crowd counting has drawn much attention due to its importance in safety-critical surveillance systems.
Recent studies have demonstrated that deep neural network (DNN) methods are vulnerable to adversarial attacks.
We propose a robust attack strategy called Adversarial Patch Attack with Momentum to evaluate the robustness of crowd counting models.
arXiv Detail & Related papers (2021-04-22T05:10:55Z) - Targeted Attack against Deep Neural Networks via Flipping Limited Weight
Bits [55.740716446995805]
We study a novel attack paradigm, which modifies model parameters in the deployment stage for malicious purposes.
Our goal is to misclassify a specific sample into a target class without any sample modification.
By utilizing the latest technique in integer programming, we equivalently reformulate this BIP problem as a continuous optimization problem.
arXiv Detail & Related papers (2021-02-21T03:13:27Z) - TAD: Trigger Approximation based Black-box Trojan Detection for AI [16.741385045881113]
Deep Neural Networks (DNNs) have demonstrated unprecedented performance across various fields such as medical diagnosis and autonomous driving.
They are identified to be vulnerable to Trojan (NT) attacks that are controlled and activated by the trigger.
We propose a robust Trojan detection scheme that inspects whether a pre-trained AI model has been Trojaned before its deployment.
arXiv Detail & Related papers (2021-02-03T00:49:50Z) - Detecting Trojaned DNNs Using Counterfactual Attributions [15.988574580713328]
Such models behave normally with typical inputs but produce specific incorrect predictions for inputs with a Trojan trigger.
Our approach is based on a novel observation that the trigger behavior depends on a few ghost neurons that activate on trigger pattern.
We use this information for Trojan detection by using a deep set encoder.
arXiv Detail & Related papers (2020-12-03T21:21:33Z) - SADet: Learning An Efficient and Accurate Pedestrian Detector [68.66857832440897]
This paper proposes a series of systematic optimization strategies for the detection pipeline of one-stage detector.
It forms a single shot anchor-based detector (SADet) for efficient and accurate pedestrian detection.
Though structurally simple, it presents state-of-the-art result and real-time speed of $20$ FPS for VGA-resolution images.
arXiv Detail & Related papers (2020-07-26T12:32:38Z) - Odyssey: Creation, Analysis and Detection of Trojan Models [91.13959405645959]
Trojan attacks interfere with the training pipeline by inserting triggers into some of the training samples and trains the model to act maliciously only for samples that contain the trigger.
Existing Trojan detectors make strong assumptions about the types of triggers and attacks.
We propose a detector that is based on the analysis of the intrinsic properties; that are affected due to the Trojaning process.
arXiv Detail & Related papers (2020-07-16T06:55:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.