Foiling Explanations in Deep Neural Networks
- URL: http://arxiv.org/abs/2211.14860v3
- Date: Sun, 13 Aug 2023 16:37:17 GMT
- Title: Foiling Explanations in Deep Neural Networks
- Authors: Snir Vitrack Tamam, Raz Lapid, Moshe Sipper
- Abstract summary: This paper uncovers a troubling property of explanation methods for image-based DNNs.
We demonstrate how explanations may be arbitrarily manipulated through the use of evolution strategies.
Our novel algorithm is successfully able to manipulate an image in a manner imperceptible to the human eye.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks (DNNs) have greatly impacted numerous fields over the
past decade. Yet despite exhibiting superb performance over many problems,
their black-box nature still poses a significant challenge with respect to
explainability. Indeed, explainable artificial intelligence (XAI) is crucial in
several fields, wherein the answer alone -- sans a reasoning of how said answer
was derived -- is of little value. This paper uncovers a troubling property of
explanation methods for image-based DNNs: by making small visual changes to the
input image -- hardly influencing the network's output -- we demonstrate how
explanations may be arbitrarily manipulated through the use of evolution
strategies. Our novel algorithm, AttaXAI, a model-agnostic, adversarial attack
on XAI algorithms, only requires access to the output logits of a classifier
and to the explanation map; these weak assumptions render our approach highly
useful where real-world models and data are concerned. We compare our method's
performance on two benchmark datasets -- CIFAR100 and ImageNet -- using four
different pretrained deep-learning models: VGG16-CIFAR100, VGG16-ImageNet,
MobileNet-CIFAR100, and Inception-v3-ImageNet. We find that the XAI methods can
be manipulated without the use of gradients or other model internals. Our novel
algorithm is successfully able to manipulate an image in a manner imperceptible
to the human eye, such that the XAI method outputs a specific explanation map.
To our knowledge, this is the first such method in a black-box setting, and we
believe it has significant value where explainability is desired, required, or
legally mandatory.
Related papers
- Enhancing Counterfactual Image Generation Using Mahalanobis Distance with Distribution Preferences in Feature Space [7.00851481261778]
In the realm of Artificial Intelligence (AI), the importance of Explainable Artificial Intelligence (XAI) is increasingly recognized.
One notable single-instance XAI approach is counterfactual explanation, which aids users in comprehending a model's decisions.
This paper introduces a novel method for computing feature importance within the feature space of a black-box model.
arXiv Detail & Related papers (2024-05-31T08:26:53Z) - Solving the enigma: Deriving optimal explanations of deep networks [3.9584068556746246]
We propose a novel framework designed to enhance the explainability of deep networks.
Our framework integrates various explanations from established XAI methods and employs a non-explanation to construct an optimal explanation.
Our results suggest that optimal explanations based on specific criteria are derivable.
arXiv Detail & Related papers (2024-05-16T11:49:08Z) - Advancing Post Hoc Case Based Explanation with Feature Highlighting [0.8287206589886881]
We propose two general algorithms which can isolate multiple clear feature parts in a test image, and then connect them to the explanatory cases found in the training data.
Results demonstrate that the proposed approach appropriately calibrates a users feelings of 'correctness' for ambiguous classifications in real world data.
arXiv Detail & Related papers (2023-11-06T16:34:48Z) - MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments [72.6405488990753]
Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks.
We propose a single-stage and standalone method, MOCA, which unifies both desired properties.
We achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols.
arXiv Detail & Related papers (2023-07-18T15:46:20Z) - Towards Better Explanations for Object Detection [0.0]
This paper proposes a method to explain the decision for any object detection model called D-CLOSE.
We performed tests on the MS-COCO dataset with the YOLOX model, which shows that our method outperforms D-RISE.
arXiv Detail & Related papers (2023-06-05T09:52:05Z) - Visual correspondence-based explanations improve AI robustness and
human-AI team accuracy [7.969008943697552]
We propose two novel architectures of self-interpretable image classifiers that first explain, and then predict.
Our models consistently improve (by 1 to 4 points) on out-of-distribution (OOD) datasets.
For the first time, we show that it is possible to achieve complementary human-AI team accuracy (i.e., that is higher than either AI-alone or human-alone) in ImageNet and CUB image classification tasks.
arXiv Detail & Related papers (2022-07-26T10:59:42Z) - Adaptive Convolutional Dictionary Network for CT Metal Artifact
Reduction [62.691996239590125]
We propose an adaptive convolutional dictionary network (ACDNet) for metal artifact reduction.
Our ACDNet can automatically learn the prior for artifact-free CT images via training data and adaptively adjust the representation kernels for each input CT image.
Our method inherits the clear interpretability of model-based methods and maintains the powerful representation ability of learning-based methods.
arXiv Detail & Related papers (2022-05-16T06:49:36Z) - Image Restoration by Deep Projected GSURE [115.57142046076164]
Ill-posed inverse problems appear in many image processing applications, such as deblurring and super-resolution.
We propose a new image restoration framework that is based on minimizing a loss function that includes a "projected-version" of the Generalized SteinUnbiased Risk Estimator (GSURE) and parameterization of the latent image by a CNN.
arXiv Detail & Related papers (2021-02-04T08:52:46Z) - What Do Deep Nets Learn? Class-wise Patterns Revealed in the Input Space [88.37185513453758]
We propose a method to visualize and understand the class-wise knowledge learned by deep neural networks (DNNs) under different settings.
Our method searches for a single predictive pattern in the pixel space to represent the knowledge learned by the model for each class.
In the adversarial setting, we show that adversarially trained models tend to learn more simplified shape patterns.
arXiv Detail & Related papers (2021-01-18T06:38:41Z) - This is not the Texture you are looking for! Introducing Novel
Counterfactual Explanations for Non-Experts using Generative Adversarial
Learning [59.17685450892182]
counterfactual explanation systems try to enable a counterfactual reasoning by modifying the input image.
We present a novel approach to generate such counterfactual image explanations based on adversarial image-to-image translation techniques.
Our results show that our approach leads to significantly better results regarding mental models, explanation satisfaction, trust, emotions, and self-efficacy than two state-of-the art systems.
arXiv Detail & Related papers (2020-12-22T10:08:05Z) - Adaptive Context-Aware Multi-Modal Network for Depth Completion [107.15344488719322]
We propose to adopt the graph propagation to capture the observed spatial contexts.
We then apply the attention mechanism on the propagation, which encourages the network to model the contextual information adaptively.
Finally, we introduce the symmetric gated fusion strategy to exploit the extracted multi-modal features effectively.
Our model, named Adaptive Context-Aware Multi-Modal Network (ACMNet), achieves the state-of-the-art performance on two benchmarks.
arXiv Detail & Related papers (2020-08-25T06:00:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.