Single-Class Target-Specific Attack against Interpretable Deep Learning
Systems
- URL: http://arxiv.org/abs/2307.06484v1
- Date: Wed, 12 Jul 2023 23:07:06 GMT
- Title: Single-Class Target-Specific Attack against Interpretable Deep Learning
Systems
- Authors: Eldor Abdukhamidov, Mohammed Abuhamad, George K. Thiruvathukal,
Hyoungshick Kim, Tamer Abuhmed
- Abstract summary: Single-class target-specific Adversa attack called SingleADV.
We present a novel Single-class target-specific Adversa attack called SingleADV.
- Score: 14.453881413188455
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we present a novel Single-class target-specific Adversarial
attack called SingleADV. The goal of SingleADV is to generate a universal
perturbation that deceives the target model into confusing a specific category
of objects with a target category while ensuring highly relevant and accurate
interpretations. The universal perturbation is stochastically and iteratively
optimized by minimizing the adversarial loss that is designed to consider both
the classifier and interpreter costs in targeted and non-targeted categories.
In this optimization framework, ruled by the first- and second-moment
estimations, the desired loss surface promotes high confidence and
interpretation score of adversarial samples. By avoiding unintended
misclassification of samples from other categories, SingleADV enables more
effective targeted attacks on interpretable deep learning systems in both
white-box and black-box scenarios. To evaluate the effectiveness of SingleADV,
we conduct experiments using four different model architectures (ResNet-50,
VGG-16, DenseNet-169, and Inception-V3) coupled with three interpretation
models (CAM, Grad, and MASK). Through extensive empirical evaluation, we
demonstrate that SingleADV effectively deceives the target deep learning models
and their associated interpreters under various conditions and settings. Our
experimental results show that the performance of SingleADV is effective, with
an average fooling ratio of 0.74 and an adversarial confidence level of 0.78 in
generating deceptive adversarial samples. Furthermore, we discuss several
countermeasures against SingleADV, including a transfer-based learning approach
and existing preprocessing defenses.
Related papers
- MirrorCheck: Efficient Adversarial Defense for Vision-Language Models [55.73581212134293]
We propose a novel, yet elegantly simple approach for detecting adversarial samples in Vision-Language Models.
Our method leverages Text-to-Image (T2I) models to generate images based on captions produced by target VLMs.
Empirical evaluations conducted on different datasets validate the efficacy of our approach.
arXiv Detail & Related papers (2024-06-13T15:55:04Z) - Advancing Adversarial Robustness Through Adversarial Logit Update [10.041289551532804]
Adversarial training and adversarial purification are among the most widely recognized defense strategies.
We propose a new principle, namely Adversarial Logit Update (ALU), to infer adversarial sample's labels.
Our solution achieves superior performance compared to state-of-the-art methods against a wide range of adversarial attacks.
arXiv Detail & Related papers (2023-08-29T07:13:31Z) - When Measures are Unreliable: Imperceptible Adversarial Perturbations
toward Top-$k$ Multi-Label Learning [83.8758881342346]
A novel loss function is devised to generate adversarial perturbations that could achieve both visual and measure imperceptibility.
Experiments on large-scale benchmark datasets demonstrate the superiority of our proposed method in attacking the top-$k$ multi-label systems.
arXiv Detail & Related papers (2023-07-27T13:18:47Z) - Comparative Evaluation of Recent Universal Adversarial Perturbations in
Image Classification [27.367498200911285]
The vulnerability of Convolutional Neural Networks (CNNs) to adversarial samples has recently garnered significant attention in the machine learning community.
Recent studies have unveiled the existence of universal adversarial perturbations (UAPs) that are image-agnostic and highly transferable across different CNN models.
arXiv Detail & Related papers (2023-06-20T03:29:05Z) - Alternating Objectives Generates Stronger PGD-Based Adversarial Attacks [78.2700757742992]
Projected Gradient Descent (PGD) is one of the most effective and conceptually simple algorithms to generate such adversaries.
We experimentally verify this assertion on a synthetic-data example and by evaluating our proposed method across 25 different $ell_infty$-robust models and 3 datasets.
Our strongest adversarial attack outperforms all of the white-box components of AutoAttack ensemble.
arXiv Detail & Related papers (2022-12-15T17:44:31Z) - Resisting Adversarial Attacks in Deep Neural Networks using Diverse
Decision Boundaries [12.312877365123267]
Deep learning systems are vulnerable to crafted adversarial examples, which may be imperceptible to the human eye, but can lead the model to misclassify.
We develop a new ensemble-based solution that constructs defender models with diverse decision boundaries with respect to the original model.
We present extensive experimentations using standard image classification datasets, namely MNIST, CIFAR-10 and CIFAR-100 against state-of-the-art adversarial attacks.
arXiv Detail & Related papers (2022-08-18T08:19:26Z) - PARL: Enhancing Diversity of Ensemble Networks to Resist Adversarial
Attacks via Pairwise Adversarially Robust Loss Function [13.417003144007156]
adversarial attacks tend to rely on the principle of transferability.
Ensemble methods against adversarial attacks demonstrate that an adversarial example is less likely to mislead multiple classifiers.
Recent ensemble methods have either been shown to be vulnerable to stronger adversaries or shown to lack an end-to-end evaluation.
arXiv Detail & Related papers (2021-12-09T14:26:13Z) - Towards A Conceptually Simple Defensive Approach for Few-shot
classifiers Against Adversarial Support Samples [107.38834819682315]
We study a conceptually simple approach to defend few-shot classifiers against adversarial attacks.
We propose a simple attack-agnostic detection method, using the concept of self-similarity and filtering.
Our evaluation on the miniImagenet (MI) and CUB datasets exhibit good attack detection performance.
arXiv Detail & Related papers (2021-10-24T05:46:03Z) - Solving Inefficiency of Self-supervised Representation Learning [87.30876679780532]
Existing contrastive learning methods suffer from very low learning efficiency.
Under-clustering and over-clustering problems are major obstacles to learning efficiency.
We propose a novel self-supervised learning framework using a median triplet loss.
arXiv Detail & Related papers (2021-04-18T07:47:10Z) - CD-UAP: Class Discriminative Universal Adversarial Perturbation [83.60161052867534]
A single universal adversarial perturbation (UAP) can be added to all natural images to change most of their predicted class labels.
We propose a new universal attack method to generate a single perturbation that fools a target network to misclassify only a chosen group of classes.
arXiv Detail & Related papers (2020-10-07T09:26:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.