Improved and Interpretable Defense to Transferred Adversarial Examples
by Jacobian Norm with Selective Input Gradient Regularization
- URL: http://arxiv.org/abs/2207.13036v2
- Date: Wed, 27 Jul 2022 09:26:30 GMT
- Title: Improved and Interpretable Defense to Transferred Adversarial Examples
by Jacobian Norm with Selective Input Gradient Regularization
- Authors: Deyin Liu, Lin Wu, Farid Boussaid, Mohammed Bennamoun
- Abstract summary: Adversarial training (AT) is often adopted to improve the robustness of deep neural networks (DNNs)
In this work, we propose an approach based on Jacobian norm and Selective Input Gradient Regularization (J- SIGR)
Experiments demonstrate that the proposed J- SIGR confers improved robustness against transferred adversarial attacks, and we also show that the predictions from the neural network are easy to interpret.
- Score: 31.516568778193157
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Deep neural networks (DNNs) are known to be vulnerable to adversarial
examples that are crafted with imperceptible perturbations, i.e., a small
change in an input image can induce a mis-classification, and thus threatens
the reliability of deep learning based deployment systems. Adversarial training
(AT) is often adopted to improve the robustness of DNNs through training a
mixture of corrupted and clean data. However, most of AT based methods are
ineffective in dealing with \textit{transferred adversarial examples} which are
generated to fool a wide spectrum of defense models, and thus cannot satisfy
the generalization requirement raised in real-world scenarios. Moreover,
adversarially training a defense model in general cannot produce interpretable
predictions towards the inputs with perturbations, whilst a highly
interpretable robust model is required by different domain experts to
understand the behaviour of a DNN. In this work, we propose an approach based
on Jacobian norm and Selective Input Gradient Regularization (J-SIGR), which
suggests the linearized robustness through Jacobian normalization and also
regularizes the perturbation-based saliency maps to imitate the model's
interpretable predictions. As such, we achieve both the improved defense and
high interpretability of DNNs. Finally, we evaluate our method across different
architectures against powerful adversarial attacks. Experiments demonstrate
that the proposed J-SIGR confers improved robustness against transferred
adversarial attacks, and we also show that the predictions from the neural
network are easy to interpret.
Related papers
- Distributed Adversarial Training to Robustify Deep Neural Networks at
Scale [100.19539096465101]
Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification.
To defend against such attacks, an effective approach, known as adversarial training (AT), has been shown to mitigate robust training.
We propose a large-batch adversarial training framework implemented over multiple machines.
arXiv Detail & Related papers (2022-06-13T15:39:43Z) - Latent Boundary-guided Adversarial Training [61.43040235982727]
Adrial training is proved to be the most effective strategy that injects adversarial examples into model training.
We propose a novel adversarial training framework called LAtent bounDary-guided aDvErsarial tRaining.
arXiv Detail & Related papers (2022-06-08T07:40:55Z) - Self-Ensemble Adversarial Training for Improved Robustness [14.244311026737666]
Adversarial training is the strongest strategy against various adversarial attacks among all sorts of defense methods.
Recent works mainly focus on developing new loss functions or regularizers, attempting to find the unique optimal point in the weight space.
We devise a simple but powerful emphSelf-Ensemble Adversarial Training (SEAT) method for yielding a robust classifier by averaging weights of history models.
arXiv Detail & Related papers (2022-03-18T01:12:18Z) - Learning from Attacks: Attacking Variational Autoencoder for Improving
Image Classification [17.881134865491063]
Adversarial attacks are often considered as threats to the robustness of Deep Neural Networks (DNNs)
This work analyzes adversarial attacks from a different perspective. Namely, adversarial examples contain implicit information that is useful to the predictions.
We propose an algorithmic framework that leverages the advantages of the DNNs for data self-expression and task-specific predictions.
arXiv Detail & Related papers (2022-03-11T08:48:26Z) - A Unified Wasserstein Distributional Robustness Framework for
Adversarial Training [24.411703133156394]
This paper presents a unified framework that connects Wasserstein distributional robustness with current state-of-the-art AT methods.
We introduce a new Wasserstein cost function and a new series of risk functions, with which we show that standard AT methods are special cases of their counterparts in our framework.
This connection leads to an intuitive relaxation and generalization of existing AT methods and facilitates the development of a new family of distributional robustness AT-based algorithms.
arXiv Detail & Related papers (2022-02-27T19:40:29Z) - CC-Cert: A Probabilistic Approach to Certify General Robustness of
Neural Networks [58.29502185344086]
In safety-critical machine learning applications, it is crucial to defend models against adversarial attacks.
It is important to provide provable guarantees for deep learning models against semantically meaningful input transformations.
We propose a new universal probabilistic certification approach based on Chernoff-Cramer bounds.
arXiv Detail & Related papers (2021-09-22T12:46:04Z) - Attribute-Guided Adversarial Training for Robustness to Natural
Perturbations [64.35805267250682]
We propose an adversarial training approach which learns to generate new samples so as to maximize exposure of the classifier to the attributes-space.
Our approach enables deep neural networks to be robust against a wide range of naturally occurring perturbations.
arXiv Detail & Related papers (2020-12-03T10:17:30Z) - A Self-supervised Approach for Adversarial Robustness [105.88250594033053]
Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems.
This paper proposes a self-supervised adversarial training mechanism in the input space.
It provides significant robustness against the textbfunseen adversarial attacks.
arXiv Detail & Related papers (2020-06-08T20:42:39Z) - Adversarial Distributional Training for Robust Deep Learning [53.300984501078126]
Adversarial training (AT) is among the most effective techniques to improve model robustness by augmenting training data with adversarial examples.
Most existing AT methods adopt a specific attack to craft adversarial examples, leading to the unreliable robustness against other unseen attacks.
In this paper, we introduce adversarial distributional training (ADT), a novel framework for learning robust models.
arXiv Detail & Related papers (2020-02-14T12:36:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.