The TrojAI Software Framework: An OpenSource tool for Embedding Trojans
into Deep Learning Models
- URL: http://arxiv.org/abs/2003.07233v1
- Date: Fri, 13 Mar 2020 01:45:32 GMT
- Title: The TrojAI Software Framework: An OpenSource tool for Embedding Trojans
into Deep Learning Models
- Authors: Kiran Karra, Chace Ashcraft, Neil Fendley
- Abstract summary: TrojAI is an open source set of Python tools capable of generating triggered (poisoned) datasets and associated deep learning models with trojans at scale.
We show that the nature of the trigger, training batch size, and dataset poisoning percentage all affect successful embedding of trojans.
We test Neural Cleanse against the trojaned MNIST models and successfully detect anomalies in the trained models approximately $18%$ of the time.
- Score: 4.8986598953553555
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we introduce the TrojAI software framework, an open source set
of Python tools capable of generating triggered (poisoned) datasets and
associated deep learning (DL) models with trojans at scale. We utilize the
developed framework to generate a large set of trojaned MNIST classifiers, as
well as demonstrate the capability to produce a trojaned reinforcement-learning
model using vector observations. Results on MNIST show that the nature of the
trigger, training batch size, and dataset poisoning percentage all affect
successful embedding of trojans. We test Neural Cleanse against the trojaned
MNIST models and successfully detect anomalies in the trained models
approximately $18\%$ of the time. Our experiments and workflow indicate that
the TrojAI software framework will enable researchers to easily understand the
effects of various configurations of the dataset and training hyperparameters
on the generated trojaned deep learning model, and can be used to rapidly and
comprehensively test new trojan detection methods.
Related papers
- Trojan Model Detection Using Activation Optimization [15.032071953322594]
Training machine learning models can be very expensive or even unaffordable.
Pre-trained models can be infected with Trojan attacks.
We present a novel method for detecting Trojan models.
arXiv Detail & Related papers (2023-06-08T02:17:29Z) - PerD: Perturbation Sensitivity-based Neural Trojan Detection Framework
on NLP Applications [21.854581570954075]
Trojan attacks embed the backdoor into the victim and is activated by the trigger in the input space.
We propose a model-level Trojan detection framework by analyzing the deviation of the model output when we introduce a specially crafted perturbation to the input.
We demonstrate the effectiveness of our proposed method on both a dataset of NLP models we create and a public dataset of Trojaned NLP models from TrojAI.
arXiv Detail & Related papers (2022-08-08T22:50:03Z) - CARLA-GeAR: a Dataset Generator for a Systematic Evaluation of
Adversarial Robustness of Vision Models [61.68061613161187]
This paper presents CARLA-GeAR, a tool for the automatic generation of synthetic datasets for evaluating the robustness of neural models against physical adversarial patches.
The tool is built on the CARLA simulator, using its Python API, and allows the generation of datasets for several vision tasks in the context of autonomous driving.
The paper presents an experimental study to evaluate the performance of some defense methods against such attacks, showing how the datasets generated with CARLA-GeAR might be used in future work as a benchmark for adversarial defense in the real world.
arXiv Detail & Related papers (2022-06-09T09:17:38Z) - Online Defense of Trojaned Models using Misattributions [18.16378666013071]
This paper proposes a new approach to detecting neural Trojans on Deep Neural Networks during inference.
We evaluate our approach on several benchmarks, including models trained on MNIST, Fashion MNIST, and German Traffic Sign Recognition Benchmark.
arXiv Detail & Related papers (2021-03-29T19:53:44Z) - Practical Detection of Trojan Neural Networks: Data-Limited and
Data-Free Cases [87.69818690239627]
We study the problem of the Trojan network (TrojanNet) detection in the data-scarce regime.
We propose a data-limited TrojanNet detector (TND), when only a few data samples are available for TrojanNet detection.
In addition, we propose a data-free TND, which can detect a TrojanNet without accessing any data samples.
arXiv Detail & Related papers (2020-07-31T02:00:38Z) - Cassandra: Detecting Trojaned Networks from Adversarial Perturbations [92.43879594465422]
In many cases, pre-trained models are sourced from vendors who may have disrupted the training pipeline to insert Trojan behaviors into the models.
We propose a method to verify if a pre-trained model is Trojaned or benign.
Our method captures fingerprints of neural networks in the form of adversarial perturbations learned from the network gradients.
arXiv Detail & Related papers (2020-07-28T19:00:40Z) - Odyssey: Creation, Analysis and Detection of Trojan Models [91.13959405645959]
Trojan attacks interfere with the training pipeline by inserting triggers into some of the training samples and trains the model to act maliciously only for samples that contain the trigger.
Existing Trojan detectors make strong assumptions about the types of triggers and attacks.
We propose a detector that is based on the analysis of the intrinsic properties; that are affected due to the Trojaning process.
arXiv Detail & Related papers (2020-07-16T06:55:00Z) - An Embarrassingly Simple Approach for Trojan Attack in Deep Neural
Networks [59.42357806777537]
trojan attack aims to attack deployed deep neural networks (DNNs) relying on hidden trigger patterns inserted by hackers.
We propose a training-free attack approach which is different from previous work, in which trojaned behaviors are injected by retraining model on a poisoned dataset.
The proposed TrojanNet has several nice properties including (1) it activates by tiny trigger patterns and keeps silent for other signals, (2) it is model-agnostic and could be injected into most DNNs, dramatically expanding its attack scenarios, and (3) the training-free mechanism saves massive training efforts compared to conventional trojan attack methods.
arXiv Detail & Related papers (2020-06-15T04:58:28Z) - Scalable Backdoor Detection in Neural Networks [61.39635364047679]
Deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch.
We propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types.
In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.
arXiv Detail & Related papers (2020-06-10T04:12:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.