Related papers: ConFoc: Content-Focus Protection Against Trojan Attacks on Neural Networks

ConFoc: Content-Focus Protection Against Trojan Attacks on Neural Networks

URL: http://arxiv.org/abs/2007.00711v1
Date: Wed, 1 Jul 2020 19:25:34 GMT
Title: ConFoc: Content-Focus Protection Against Trojan Attacks on Neural Networks
Authors: Miguel Villarreal-Vasquez and Bharat Bhargava
Abstract summary: trojan attacks insert some misbehavior at training using samples with a mark or trigger, which is exploited at inference or testing time. We propose a novel defensive technique against trojan attacks, in which DNNs are taught to disregard the styles of inputs and focus on their content. Results show that the method reduces the attack success rate significantly to values 1% in all the tested attacks.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep Neural Networks (DNNs) have been applied successfully in computer vision. However, their wide adoption in image-related applications is threatened by their vulnerability to trojan attacks. These attacks insert some misbehavior at training using samples with a mark or trigger, which is exploited at inference or testing time. In this work, we analyze the composition of the features learned by DNNs at training. We identify that they, including those related to the inserted triggers, contain both content (semantic information) and style (texture information), which are recognized as a whole by DNNs at testing time. We then propose a novel defensive technique against trojan attacks, in which DNNs are taught to disregard the styles of inputs and focus on their content only to mitigate the effect of triggers during the classification. The generic applicability of the approach is demonstrated in the context of a traffic sign and a face recognition application. Each of them is exposed to a different attack with a variety of triggers. Results show that the method reduces the attack success rate significantly to values < 1% in all the tested attacks while keeping as well as improving the initial accuracy of the models when processing both benign and adversarial data.

Related papers

Attention-Enhancing Backdoor Attacks Against BERT-based Models [54.070555070629105]
Investigating the strategies of backdoor attacks will help to understand the model's vulnerability. We propose a novel Trojan Attention Loss (TAL) which enhances the Trojan behavior by directly manipulating the attention patterns.
arXiv Detail & Related papers (2023-10-23T01:24:56Z)
Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks. backdoor attack is an emerging yet threatening training-phase threat. We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z)
Invisible Backdoor Attacks Using Data Poisoning in the Frequency Domain [8.64369418938889]
We propose a generalized backdoor attack method based on the frequency domain. It can implement backdoor implantation without mislabeling and accessing the training process. We evaluate our approach in the no-label and clean-label cases on three datasets.
arXiv Detail & Related papers (2022-07-09T07:05:53Z)
BppAttack: Stealthy and Efficient Trojan Attacks against Deep Neural Networks via Image Quantization and Contrastive Adversarial Learning [13.959966918979395]
Deep neural networks are vulnerable to Trojan attacks. Existing attacks use visible patterns as triggers, which are vulnerable to human inspection. We propose stealthy and efficient Trojan attacks, BppAttack.
arXiv Detail & Related papers (2022-05-26T14:15:19Z)
PiDAn: A Coherence Optimization Approach for Backdoor Attack Detection and Mitigation in Deep Neural Networks [22.900501880865658]
Backdoor attacks impose a new threat in Deep Neural Networks (DNNs) We propose PiDAn, an algorithm based on coherence optimization purifying the poisoned data. Our PiDAn algorithm can detect more than 90% infected classes and identify 95% poisoned samples.
arXiv Detail & Related papers (2022-03-17T12:37:21Z)
Learning from Attacks: Attacking Variational Autoencoder for Improving Image Classification [17.881134865491063]
Adversarial attacks are often considered as threats to the robustness of Deep Neural Networks (DNNs) This work analyzes adversarial attacks from a different perspective. Namely, adversarial examples contain implicit information that is useful to the predictions. We propose an algorithmic framework that leverages the advantages of the DNNs for data self-expression and task-specific predictions.
arXiv Detail & Related papers (2022-03-11T08:48:26Z)
KATANA: Simple Post-Training Robustness Using Test Time Augmentations [49.28906786793494]
A leading defense against such attacks is adversarial training, a technique in which a DNN is trained to be robust to adversarial attacks. We propose a new simple and easy-to-use technique, KATANA, for robustifying an existing pretrained DNN without modifying its weights. Our strategy achieves state-of-the-art adversarial robustness on diverse attacks with minimal compromise on the natural images' classification.
arXiv Detail & Related papers (2021-09-16T19:16:00Z)
BreakingBED -- Breaking Binary and Efficient Deep Neural Networks by Adversarial Attacks [65.2021953284622]
We study robustness of CNNs against white-box and black-box adversarial attacks. Results are shown for distilled CNNs, agent-based state-of-the-art pruned models, and binarized neural networks.
arXiv Detail & Related papers (2021-03-14T20:43:19Z)
Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data. We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level. Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z)
Mitigating the Impact of Adversarial Attacks in Very Deep Networks [10.555822166916705]
Deep Neural Network (DNN) models have vulnerabilities related to security concerns. Data poisoning-enabled perturbation attacks are complex adversarial ones that inject false data into models. We propose an attack-agnostic-based defense method for mitigating their influence.
arXiv Detail & Related papers (2020-12-08T21:25:44Z)
Odyssey: Creation, Analysis and Detection of Trojan Models [91.13959405645959]
Trojan attacks interfere with the training pipeline by inserting triggers into some of the training samples and trains the model to act maliciously only for samples that contain the trigger. Existing Trojan detectors make strong assumptions about the types of triggers and attacks. We propose a detector that is based on the analysis of the intrinsic properties; that are affected due to the Trojaning process.
arXiv Detail & Related papers (2020-07-16T06:55:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.