Black-box Attacks on Image Activity Prediction and its Natural Language
Explanations
- URL: http://arxiv.org/abs/2310.00503v1
- Date: Sat, 30 Sep 2023 21:56:43 GMT
- Title: Black-box Attacks on Image Activity Prediction and its Natural Language
Explanations
- Authors: Alina Elena Baia, Valentina Poggioni, Andrea Cavallaro
- Abstract summary: Explainable AI (XAI) methods aim to describe the decision process of deep neural networks.
Visual XAI methods have been shown to be vulnerable to white-box and gray-box adversarial attacks.
We show that we can create adversarial images that manipulate the explanations of an activity recognition model by having access only to its final output.
- Score: 27.301741710016223
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Explainable AI (XAI) methods aim to describe the decision process of deep
neural networks. Early XAI methods produced visual explanations, whereas more
recent techniques generate multimodal explanations that include textual
information and visual representations. Visual XAI methods have been shown to
be vulnerable to white-box and gray-box adversarial attacks, with an attacker
having full or partial knowledge of and access to the target system. As the
vulnerabilities of multimodal XAI models have not been examined, in this paper
we assess for the first time the robustness to black-box attacks of the natural
language explanations generated by a self-rationalizing image-based activity
recognition model. We generate unrestricted, spatially variant perturbations
that disrupt the association between the predictions and the corresponding
explanations to mislead the model into generating unfaithful explanations. We
show that we can create adversarial images that manipulate the explanations of
an activity recognition model by having access only to its final output.
Related papers
- MEGL: Multimodal Explanation-Guided Learning [23.54169888224728]
We propose a novel Multimodal Explanation-Guided Learning (MEGL) framework to enhance model interpretability and improve classification performance.
Our Saliency-Driven Textual Grounding (SDTG) approach integrates spatial information from visual explanations into textual rationales, providing spatially grounded and contextually rich explanations.
We validate MEGL on two new datasets, Object-ME and Action-ME, for image classification with multimodal explanations.
arXiv Detail & Related papers (2024-11-20T05:57:00Z) - VALE: A Multimodal Visual and Language Explanation Framework for Image Classifiers using eXplainable AI and Language Models [0.0]
We propose a novel framework named VALE Visual and Language Explanation.
VALE integrates explainable AI techniques with advanced language models to provide comprehensive explanations.
In this paper, we conduct a pilot study of the VALE framework for image classification tasks.
arXiv Detail & Related papers (2024-08-23T03:02:11Z) - SA-Attack: Improving Adversarial Transferability of Vision-Language
Pre-training Models via Self-Augmentation [56.622250514119294]
In contrast to white-box adversarial attacks, transfer attacks are more reflective of real-world scenarios.
We propose a self-augment-based transfer attack method, termed SA-Attack.
arXiv Detail & Related papers (2023-12-08T09:08:50Z) - Advancing Post Hoc Case Based Explanation with Feature Highlighting [0.8287206589886881]
We propose two general algorithms which can isolate multiple clear feature parts in a test image, and then connect them to the explanatory cases found in the training data.
Results demonstrate that the proposed approach appropriately calibrates a users feelings of 'correctness' for ambiguous classifications in real world data.
arXiv Detail & Related papers (2023-11-06T16:34:48Z) - Foiling Explanations in Deep Neural Networks [0.0]
This paper uncovers a troubling property of explanation methods for image-based DNNs.
We demonstrate how explanations may be arbitrarily manipulated through the use of evolution strategies.
Our novel algorithm is successfully able to manipulate an image in a manner imperceptible to the human eye.
arXiv Detail & Related papers (2022-11-27T15:29:39Z) - ProtoShotXAI: Using Prototypical Few-Shot Architecture for Explainable
AI [4.629694186457133]
Unexplainable black-box models create scenarios where anomalies cause deleterious responses, thus creating unacceptable risks.
We present an approach, ProtoShotXAI, that uses a Prototypical few-shot network to explore the contrastive manifold between nonlinear features of different classes.
Our approach is the first locally interpretable XAI model that can be extended to, and demonstrated on, few-shot networks.
arXiv Detail & Related papers (2021-10-22T05:24:52Z) - CX-ToM: Counterfactual Explanations with Theory-of-Mind for Enhancing
Human Trust in Image Recognition Models [84.32751938563426]
We propose a new explainable AI (XAI) framework for explaining decisions made by a deep convolutional neural network (CNN)
In contrast to the current methods in XAI that generate explanations as a single shot response, we pose explanation as an iterative communication process.
Our framework generates sequence of explanations in a dialog by mediating the differences between the minds of machine and human user.
arXiv Detail & Related papers (2021-09-03T09:46:20Z) - Explainers in the Wild: Making Surrogate Explainers Robust to
Distortions through Perception [77.34726150561087]
We propose a methodology to evaluate the effect of distortions in explanations by embedding perceptual distances.
We generate explanations for images in the Imagenet-C dataset and demonstrate how using a perceptual distances in the surrogate explainer creates more coherent explanations for the distorted and reference images.
arXiv Detail & Related papers (2021-02-22T12:38:53Z) - This is not the Texture you are looking for! Introducing Novel
Counterfactual Explanations for Non-Experts using Generative Adversarial
Learning [59.17685450892182]
counterfactual explanation systems try to enable a counterfactual reasoning by modifying the input image.
We present a novel approach to generate such counterfactual image explanations based on adversarial image-to-image translation techniques.
Our results show that our approach leads to significantly better results regarding mental models, explanation satisfaction, trust, emotions, and self-efficacy than two state-of-the art systems.
arXiv Detail & Related papers (2020-12-22T10:08:05Z) - Proactive Pseudo-Intervention: Causally Informed Contrastive Learning
For Interpretable Vision Models [103.64435911083432]
We present a novel contrastive learning strategy called it Proactive Pseudo-Intervention (PPI)
PPI leverages proactive interventions to guard against image features with no causal relevance.
We also devise a novel causally informed salience mapping module to identify key image pixels to intervene, and show it greatly facilitates model interpretability.
arXiv Detail & Related papers (2020-12-06T20:30:26Z) - Model extraction from counterfactual explanations [68.8204255655161]
We show how an adversary can leverage the information provided by counterfactual explanations to build high-fidelity and high-accuracy model extraction attacks.
Our attack enables the adversary to build a faithful copy of a target model by accessing its counterfactual explanations.
arXiv Detail & Related papers (2020-09-03T19:02:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.