Related papers: DI-AA: An Interpretable White-box Attack for Fooling Deep Neural Networks

DI-AA: An Interpretable White-box Attack for Fooling Deep Neural Networks

URL: http://arxiv.org/abs/2110.07305v1
Date: Thu, 14 Oct 2021 12:15:58 GMT
Title: DI-AA: An Interpretable White-box Attack for Fooling Deep Neural Networks
Authors: Yixiang Wang, Jiqiang Liu, Xiaolin Chang, Jianhua Wang, Ricardo J. Rodr\'iguez
Abstract summary: White-box Adversarial Example (AE) attacks towards Deep Neural Networks (DNNs) have a more powerful destructive capacity than black-box AE attacks. We propose an interpretable white-box AE attack approach, DI-AA, which explores the application of the interpretable approach of the deep Taylor decomposition.
Score: 6.704751710867746
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: White-box Adversarial Example (AE) attacks towards Deep Neural Networks (DNNs) have a more powerful destructive capacity than black-box AE attacks in the fields of AE strategies. However, almost all the white-box approaches lack interpretation from the point of view of DNNs. That is, adversaries did not investigate the attacks from the perspective of interpretable features, and few of these approaches considered what features the DNN actually learns. In this paper, we propose an interpretable white-box AE attack approach, DI-AA, which explores the application of the interpretable approach of the deep Taylor decomposition in the selection of the most contributing features and adopts the Lagrangian relaxation optimization of the logit output and L_p norm to further decrease the perturbation. We compare DI-AA with six baseline attacks (including the state-of-the-art attack AutoAttack) on three datasets. Experimental results reveal that our proposed approach can 1) attack non-robust models with comparatively low perturbation, where the perturbation is closer to or lower than the AutoAttack approach; 2) break the TRADES adversarial training models with the highest success rate; 3) the generated AE can reduce the robust accuracy of the robust black-box models by 16% to 31% in the black-box transfer attack.

Related papers

Hard-Label Black-Box Attacks on 3D Point Clouds [66.52447238776482]
We introduce a novel 3D attack method based on a new spectrum-aware decision boundary algorithm to generate high-quality adversarial samples. Experiments demonstrate that our attack competitively outperforms existing white/black-box attackers in terms of attack performance and adversary quality.
arXiv Detail & Related papers (2024-11-30T09:05:02Z)
DifAttack: Query-Efficient Black-Box Attack via Disentangled Feature Space [6.238161846680642]
This work investigates efficient score-based black-box adversarial attacks with a high Attack Success Rate (ASR) and good generalizability. We design a novel attack method based on a Disentangled Feature space, called DifAttack, which differs significantly from the existing ones operating over the entire feature space.
arXiv Detail & Related papers (2023-09-26T00:15:13Z)
Towards Lightweight Black-Box Attacks against Deep Neural Networks [70.9865892636123]
We argue that black-box attacks can pose practical attacks where only several test samples are available. As only a few samples are required, we refer to these attacks as lightweight black-box attacks. We propose Error TransFormer (ETF) for lightweight attacks to mitigate the approximation error.
arXiv Detail & Related papers (2022-09-29T14:43:03Z)
Attackar: Attack of the Evolutionary Adversary [0.0]
This paper introduces textitAttackar, an evolutionary, score-based, black-box attack. Attackar is based on a novel objective function that can be used in gradient-free optimization problems. Our results demonstrate the superior performance of Attackar, both in terms of accuracy score and query efficiency.
arXiv Detail & Related papers (2022-08-17T13:57:23Z)
Art-Attack: Black-Box Adversarial Attack via Evolutionary Art [5.760976250387322]
Deep neural networks (DNNs) have achieved state-of-the-art performance in many tasks but have shown extreme vulnerabilities to attacks generated by adversarial examples. This paper proposes a gradient-free attack by using a concept of evolutionary art to generate adversarial examples.
arXiv Detail & Related papers (2022-03-07T12:54:09Z)
Detect and Defense Against Adversarial Examples in Deep Learning using Natural Scene Statistics and Adaptive Denoising [12.378017309516965]
We propose a framework for defending DNN against ad-versarial samples. The detector aims to detect AEs bycharacterizing them through the use of natural scenestatistic. The proposed method outperforms the state-of-the-art defense techniques.
arXiv Detail & Related papers (2021-07-12T23:45:44Z)
Going Far Boosts Attack Transferability, but Do Not Do It [16.901240544106948]
We investigate the impacts of optimization on attack transferability by comprehensive experiments concerning 7 optimization algorithms, 4 surrogates, and 9 black-box models. We surprisingly find that the varied transferability of AEs from optimization algorithms is strongly related to the Root Mean Square Error (RMSE) from their original samples. Although LARA significantly improves transferability by 20%, it is insufficient to exploit the vulnerability of DNNs.
arXiv Detail & Related papers (2021-02-20T13:19:31Z)
Boosting Gradient for White-Box Adversarial Attacks [60.422511092730026]
We propose a universal adversarial example generation method, called ADV-ReLU, to enhance the performance of gradient based white-box attack algorithms. Our approach calculates the gradient of the loss function versus network input, maps the values to scores, and selects a part of them to update the misleading gradients.
arXiv Detail & Related papers (2020-10-21T02:13:26Z)
Improving Query Efficiency of Black-box Adversarial Attack [75.71530208862319]
We propose a Neural Process based black-box adversarial attack (NP-Attack) NP-Attack could greatly decrease the query counts under the black-box setting.
arXiv Detail & Related papers (2020-09-24T06:22:56Z)
Decision-based Universal Adversarial Attack [55.76371274622313]
In black-box setting, current universal adversarial attack methods utilize substitute models to generate the perturbation. We propose an efficient Decision-based Universal Attack (DUAttack) The effectiveness of DUAttack is validated through comparisons with other state-of-the-art attacks.
arXiv Detail & Related papers (2020-09-15T12:49:03Z)
Diversity can be Transferred: Output Diversification for White- and Black-box Attacks [89.92353493977173]
Adrial attacks often involve random perturbations of the inputs drawn from uniform or Gaussian distributions, e.g., to initialize optimization-based white-box attacks or generate update directions in black-box attacks. We propose Output Diversified Sampling (ODS), a novel sampling strategy that attempts to maximize diversity in the target model's outputs among the generated samples. ODS significantly improves the performance of existing white-box and black-box attacks. In particular, ODS reduces the number of queries needed for state-of-the-art black-box attacks on ImageNet by a factor of two.
arXiv Detail & Related papers (2020-03-15T17:49:25Z)
Towards Query-Efficient Black-Box Adversary with Zeroth-Order Natural Gradient Descent [92.4348499398224]
Black-box adversarial attack methods have received special attentions owing to their practicality and simplicity. We propose a zeroth-order natural gradient descent (ZO-NGD) method to design the adversarial attacks. ZO-NGD can obtain significantly lower model query complexities compared with state-of-the-art attack methods.
arXiv Detail & Related papers (2020-02-18T21:48:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.