Backdoor Attacks on the DNN Interpretation System
- URL: http://arxiv.org/abs/2011.10698v3
- Date: Tue, 19 Jul 2022 21:42:22 GMT
- Title: Backdoor Attacks on the DNN Interpretation System
- Authors: Shihong Fang, Anna Choromanska
- Abstract summary: Interpretability is crucial to understand the inner workings of deep neural networks (DNNs)
We design a backdoor attack that alters the saliency map produced by the network for an input image only with injected trigger.
We show that our attacks constitute a serious security threat when deploying deep learning models developed by untrusty sources.
- Score: 16.587968446342995
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Interpretability is crucial to understand the inner workings of deep neural
networks (DNNs) and many interpretation methods generate saliency maps that
highlight parts of the input image that contribute the most to the prediction
made by the DNN. In this paper we design a backdoor attack that alters the
saliency map produced by the network for an input image only with injected
trigger that is invisible to the naked eye while maintaining the prediction
accuracy. The attack relies on injecting poisoned data with a trigger into the
training data set. The saliency maps are incorporated in the penalty term of
the objective function that is used to train a deep model and its influence on
model training is conditioned upon the presence of a trigger. We design two
types of attacks: targeted attack that enforces a specific modification of the
saliency map and untargeted attack when the importance scores of the top pixels
from the original saliency map are significantly reduced. We perform empirical
evaluation of the proposed backdoor attacks on gradient-based and gradient-free
interpretation methods for a variety of deep learning architectures. We show
that our attacks constitute a serious security threat when deploying deep
learning models developed by untrusty sources. Finally, in the Supplement we
demonstrate that the proposed methodology can be used in an inverted setting,
where the correct saliency map can be obtained only in the presence of a
trigger (key), effectively making the interpretation system available only to
selected users.
Related papers
- Robustness-Inspired Defense Against Backdoor Attacks on Graph Neural Networks [30.82433380830665]
Graph Neural Networks (GNNs) have achieved promising results in tasks such as node classification and graph classification.
Recent studies reveal that GNNs are vulnerable to backdoor attacks, posing a significant threat to their real-world adoption.
We propose using random edge dropping to detect backdoors and theoretically show that it can efficiently distinguish poisoned nodes from clean ones.
arXiv Detail & Related papers (2024-06-14T08:46:26Z) - Rethinking Graph Backdoor Attacks: A Distribution-Preserving Perspective [33.35835060102069]
Graph Neural Networks (GNNs) have shown remarkable performance in various tasks.
Backdoor attack poisons the graph by attaching backdoor triggers and the target class label to a set of nodes in the training graph.
In this paper, we study a novel problem of unnoticeable graph backdoor attacks with in-distribution (ID) triggers.
arXiv Detail & Related papers (2024-05-17T13:09:39Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - Backdoor Defense via Deconfounded Representation Learning [17.28760299048368]
We propose a Causality-inspired Backdoor Defense (CBD) to learn deconfounded representations for reliable classification.
CBD is effective in reducing backdoor threats while maintaining high accuracy in predicting benign samples.
arXiv Detail & Related papers (2023-03-13T02:25:59Z) - Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics.
We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z) - CAMERAS: Enhanced Resolution And Sanity preserving Class Activation
Mapping for image saliency [61.40511574314069]
Backpropagation image saliency aims at explaining model predictions by estimating model-centric importance of individual pixels in the input.
We propose CAMERAS, a technique to compute high-fidelity backpropagation saliency maps without requiring any external priors.
arXiv Detail & Related papers (2021-06-20T08:20:56Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z) - Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data.
We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level.
Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z) - Graph Backdoor [53.70971502299977]
We present GTA, the first backdoor attack on graph neural networks (GNNs)
GTA departs in significant ways: it defines triggers as specific subgraphs, including both topological structures and descriptive features.
It can be instantiated for both transductive (e.g., node classification) and inductive (e.g., graph classification) tasks.
arXiv Detail & Related papers (2020-06-21T19:45:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.