Evaluating Adversarial Attacks on ImageNet: A Reality Check on
Misclassification Classes
- URL: http://arxiv.org/abs/2111.11056v1
- Date: Mon, 22 Nov 2021 08:54:34 GMT
- Title: Evaluating Adversarial Attacks on ImageNet: A Reality Check on
Misclassification Classes
- Authors: Utku Ozbulak, Maura Pintor, Arnout Van Messem, Wesley De Neve
- Abstract summary: We investigate the nature of the classes into which adversarial examples are misclassified in ImageNet.
We find that $71%$ of the adversarial examples that achieve model-to-model adversarial transferability are misclassified into one of the top-5 classes.
We also find that a large subset of untargeted misclassifications are, in fact, misclassifications into semantically similar classes.
- Score: 3.0128052969792605
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although ImageNet was initially proposed as a dataset for performance
benchmarking in the domain of computer vision, it also enabled a variety of
other research efforts. Adversarial machine learning is one such research
effort, employing deceptive inputs to fool models in making wrong predictions.
To evaluate attacks and defenses in the field of adversarial machine learning,
ImageNet remains one of the most frequently used datasets. However, a topic
that is yet to be investigated is the nature of the classes into which
adversarial examples are misclassified. In this paper, we perform a detailed
analysis of these misclassification classes, leveraging the ImageNet class
hierarchy and measuring the relative positions of the aforementioned type of
classes in the unperturbed origins of the adversarial examples. We find that
$71\%$ of the adversarial examples that achieve model-to-model adversarial
transferability are misclassified into one of the top-5 classes predicted for
the underlying source images. We also find that a large subset of untargeted
misclassifications are, in fact, misclassifications into semantically similar
classes. Based on these findings, we discuss the need to take into account the
ImageNet class hierarchy when evaluating untargeted adversarial successes.
Furthermore, we advocate for future research efforts to incorporate categorical
information.
Related papers
- Reinforcing Pre-trained Models Using Counterfactual Images [54.26310919385808]
This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images.
We identify model weaknesses by testing the model using the counterfactual image dataset.
We employ the counterfactual images as an augmented dataset to fine-tune and reinforce the classification model.
arXiv Detail & Related papers (2024-06-19T08:07:14Z) - Classes Are Not Equal: An Empirical Study on Image Recognition Fairness [100.36114135663836]
We experimentally demonstrate that classes are not equal and the fairness issue is prevalent for image classification models across various datasets.
Our findings reveal that models tend to exhibit greater prediction biases for classes that are more challenging to recognize.
Data augmentation and representation learning algorithms improve overall performance by promoting fairness to some degree in image classification.
arXiv Detail & Related papers (2024-02-28T07:54:50Z) - Counterfactual Image Generation for adversarially robust and
interpretable Classifiers [1.3859669037499769]
We propose a unified framework leveraging image-to-image translation Generative Adrial Networks (GANs) to produce counterfactual samples.
This is achieved by combining the classifier and discriminator into a single model that attributes real images to their respective classes and flags generated images as "fake"
We show how the model exhibits improved robustness to adversarial attacks, and we show how the discriminator's "fakeness" value serves as an uncertainty measure of the predictions.
arXiv Detail & Related papers (2023-10-01T18:50:29Z) - Adversarial Attacks on Image Classification Models: FGSM and Patch
Attacks and their Impact [0.0]
This chapter introduces the concept of adversarial attacks on image classification models built on convolutional neural networks (CNN)
CNNs are very popular deep-learning models which are used in image classification tasks.
Two very well-known adversarial attacks are discussed and their impact on the performance of image classifiers is analyzed.
arXiv Detail & Related papers (2023-07-05T06:40:08Z) - Fine-Grained ImageNet Classification in the Wild [0.0]
Robustness tests can uncover several vulnerabilities and biases which go unnoticed during the typical model evaluation stage.
In our work, we perform fine-grained classification on closely related categories, which are identified with the help of hierarchical knowledge.
arXiv Detail & Related papers (2023-03-04T12:25:07Z) - Contrastive Learning for Fair Representations [50.95604482330149]
Trained classification models can unintentionally lead to biased representations and predictions.
Existing debiasing methods for classification models, such as adversarial training, are often expensive to train and difficult to optimise.
We propose a method for mitigating bias by incorporating contrastive learning, in which instances sharing the same class label are encouraged to have similar representations.
arXiv Detail & Related papers (2021-09-22T10:47:51Z) - Rethinking Natural Adversarial Examples for Classification Models [43.87819913022369]
ImageNet-A is a famous dataset of natural adversarial examples.
We validated the hypothesis by reducing the background influence in ImageNet-A examples with object detection techniques.
Experiments showed that the object detection models with various classification models as backbones obtained much higher accuracy than their corresponding classification models.
arXiv Detail & Related papers (2021-02-23T14:46:48Z) - Instance Localization for Self-supervised Detection Pretraining [68.24102560821623]
We propose a new self-supervised pretext task, called instance localization.
We show that integration of bounding boxes into pretraining promotes better task alignment and architecture alignment for transfer learning.
Experimental results demonstrate that our approach yields state-of-the-art transfer learning results for object detection.
arXiv Detail & Related papers (2021-02-16T17:58:57Z) - On the Transferability of Adversarial Attacksagainst Neural Text
Classifier [121.6758865857686]
We investigate the transferability of adversarial examples for text classification models.
We propose a genetic algorithm to find an ensemble of models that can induce adversarial examples to fool almost all existing models.
We derive word replacement rules that can be used for model diagnostics from these adversarial examples.
arXiv Detail & Related papers (2020-11-17T10:45:05Z) - Closing the Generalization Gap in One-Shot Object Detection [92.82028853413516]
We show that the key to strong few-shot detection models may not lie in sophisticated metric learning approaches, but instead in scaling the number of categories.
Future data annotation efforts should therefore focus on wider datasets and annotate a larger number of categories.
arXiv Detail & Related papers (2020-11-09T09:31:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.