XAI Benchmark for Visual Explanation
- URL: http://arxiv.org/abs/2310.08537v2
- Date: Wed, 22 Nov 2023 01:35:45 GMT
- Title: XAI Benchmark for Visual Explanation
- Authors: Yifei Zhang, Siyi Gu, James Song, Bo Pan, Guangji Bai, Liang Zhao
- Abstract summary: We develop a benchmark for visual explanation, consisting of eight datasets with human explanation annotations.
We devise a visual explanation pipeline that includes data loading, explanation generation, and method evaluation.
Our proposed benchmarks facilitate a fair evaluation and comparison of visual explanation methods.
- Score: 15.687509357300847
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rise of deep learning has ushered in significant progress in computer
vision (CV) tasks, yet the "black box" nature of these models often precludes
interpretability. This challenge has spurred the development of Explainable
Artificial Intelligence (XAI) by generating explanations to AI's
decision-making process. An explanation is aimed to not only faithfully reflect
the true reasoning process (i.e., faithfulness) but also align with humans'
reasoning (i.e., alignment). Within XAI, visual explanations employ visual cues
to elucidate the reasoning behind machine learning models, particularly in
image processing, by highlighting images' critical areas important to
predictions. Despite the considerable body of research in visual explanations,
standardized benchmarks for evaluating them are seriously underdeveloped. In
particular, to evaluate alignment, existing works usually merely illustrate a
few images' visual explanations, or hire some referees to report the
explanation quality under ad-hoc questionnaires. However, this cannot achieve a
standardized, quantitative, and comprehensive evaluation. To address this
issue, we develop a benchmark for visual explanation, consisting of eight
datasets with human explanation annotations from various domains, accommodating
both post-hoc and intrinsic visual explanation methods. Additionally, we devise
a visual explanation pipeline that includes data loading, explanation
generation, and method evaluation. Our proposed benchmarks facilitate a fair
evaluation and comparison of visual explanation methods. Building on our
curated collection of datasets, we benchmarked eight existing visual
explanation methods and conducted a thorough comparison across four selected
datasets using six alignment-based and causality-based metrics. Our benchmark
will be accessible through our website https://xaidataset.github.io.
Related papers
- Interpretable Image Classification via Non-parametric Part Prototype Learning [14.390730075612248]
Classifying images with an interpretable decision-making process is a long-standing problem in computer vision.
In recent years, Prototypical Part Networks has gained traction as an approach for self-explainable neural networks.
We present a framework for part-based interpretable image classification that learns a set of semantically distinctive object parts for each class.
arXiv Detail & Related papers (2025-03-13T10:46:53Z) - HMGIE: Hierarchical and Multi-Grained Inconsistency Evaluation for Vision-Language Data Cleansing [54.970275599061594]
We design an adaptive evaluation framework, called Hierarchical and Multi-Grained Inconsistency Evaluation (HMGIE)
HMGIE can provide multi-grained evaluations covering both accuracy and completeness for various image-caption pairs.
To verify the efficacy and flexibility of the proposed framework, we construct MVTID, an image-caption dataset with diverse types and granularities of inconsistencies.
arXiv Detail & Related papers (2024-12-07T15:47:49Z) - MEGL: Multimodal Explanation-Guided Learning [23.54169888224728]
We propose a novel Multimodal Explanation-Guided Learning (MEGL) framework to enhance model interpretability and improve classification performance.
Our Saliency-Driven Textual Grounding (SDTG) approach integrates spatial information from visual explanations into textual rationales, providing spatially grounded and contextually rich explanations.
We validate MEGL on two new datasets, Object-ME and Action-ME, for image classification with multimodal explanations.
arXiv Detail & Related papers (2024-11-20T05:57:00Z) - Intrinsic Subgraph Generation for Interpretable Graph based Visual Question Answering [27.193336817953142]
We introduce an interpretable approach for graph-based Visual Question Answering (VQA)
Our model is designed to intrinsically produce a subgraph during the question-answering process as its explanation.
We compare these generated subgraphs against established post-hoc explainability methods for graph neural networks, and perform a human evaluation.
arXiv Detail & Related papers (2024-03-26T12:29:18Z) - Evaluating the Utility of Model Explanations for Model Development [54.23538543168767]
We evaluate whether explanations can improve human decision-making in practical scenarios of machine learning model development.
To our surprise, we did not find evidence of significant improvement on tasks when users were provided with any of the saliency maps.
These findings suggest caution regarding the usefulness and potential for misunderstanding in saliency-based explanations.
arXiv Detail & Related papers (2023-12-10T23:13:23Z) - Explaining Explainability: Towards Deeper Actionable Insights into Deep
Learning through Second-order Explainability [70.60433013657693]
Second-order explainable AI (SOXAI) was recently proposed to extend explainable AI (XAI) from the instance level to the dataset level.
We demonstrate for the first time, via example classification and segmentation cases, that eliminating irrelevant concepts from the training set based on actionable insights from SOXAI can enhance a model's performance.
arXiv Detail & Related papers (2023-06-14T23:24:01Z) - Dynamic Clue Bottlenecks: Towards Interpretable-by-Design Visual Question Answering [58.64831511644917]
We introduce an interpretable by design model that factors model decisions into intermediate human-legible explanations.
We show that our inherently interpretable system can improve 4.64% over a comparable black-box system in reasoning-focused questions.
arXiv Detail & Related papers (2023-05-24T08:33:15Z) - Modeling Entities as Semantic Points for Visual Information Extraction
in the Wild [55.91783742370978]
We propose an alternative approach to precisely and robustly extract key information from document images.
We explicitly model entities as semantic points, i.e., center points of entities are enriched with semantic information describing the attributes and relationships of different entities.
The proposed method can achieve significantly enhanced performance on entity labeling and linking, compared with previous state-of-the-art models.
arXiv Detail & Related papers (2023-03-23T08:21:16Z) - Learning Confident Classifiers in the Presence of Label Noise [5.551384206194696]
This paper proposes a probabilistic model for noisy observations that allows us to build a confident classification and segmentation models.
Our experiments show that our algorithm outperforms state-of-the-art solutions for the considered classification and segmentation problems.
arXiv Detail & Related papers (2023-01-02T04:27:25Z) - Towards Unsupervised Visual Reasoning: Do Off-The-Shelf Features Know
How to Reason? [30.16956370267339]
We introduce a protocol to evaluate visual representations for the task of Visual Question Answering.
In order to decouple visual feature extraction from reasoning, we design a specific attention-based reasoning module.
We compare two types of visual representations, densely extracted local features and object-centric ones, against the performances of a perfect image representation using ground truth.
arXiv Detail & Related papers (2022-12-20T14:36:45Z) - REVEL Framework to measure Local Linear Explanations for black-box
models: Deep Learning Image Classification case of study [12.49538398746092]
We propose a procedure called REVEL to evaluate different aspects concerning the quality of explanations with a theoretically coherent development.
The experiments have been carried out on image four datasets as benchmark where we show REVEL's descriptive and analytical power.
arXiv Detail & Related papers (2022-11-11T12:15:36Z) - Understanding ME? Multimodal Evaluation for Fine-grained Visual
Commonsense [98.70218717851665]
It is unclear whether the models really understand the visual scene and underlying commonsense knowledge due to limited evaluation data resources.
We present a Multimodal Evaluation (ME) pipeline to automatically generate question-answer pairs to test models' understanding of the visual scene, text, and related knowledge.
We then take a step further to show that training with the ME data boosts the model's performance in standard VCR evaluation.
arXiv Detail & Related papers (2022-11-10T21:44:33Z) - Towards Automatic Parsing of Structured Visual Content through the Use
of Synthetic Data [65.68384124394699]
We propose a synthetic dataset, containing Structured Visual Content (SVCs) in the form of images and ground truths.
We show the usage of this dataset by an application that automatically extracts a graph representation from an SVC image.
Our dataset enables the development of strong models for the interpretation of SVCs while skipping the time-consuming dense data annotation.
arXiv Detail & Related papers (2022-04-29T14:44:52Z) - ADVISE: ADaptive Feature Relevance and VISual Explanations for
Convolutional Neural Networks [0.745554610293091]
We introduce ADVISE, a new explainability method that quantifies and leverages the relevance of each unit of the feature map to provide better visual explanations.
We extensively evaluate our idea in the image classification task using AlexNet, VGG16, ResNet50, and Xception pretrained on ImageNet.
Our experiments further show that ADVISE fulfils the sensitivity and implementation independence axioms while passing the sanity checks.
arXiv Detail & Related papers (2022-03-02T18:16:57Z) - Cross-Domain Few-Shot Graph Classification [7.23389716633927]
We study the problem of few-shot graph classification across domains with nonequivalent feature spaces.
We propose an attention-based graph encoder that uses three congruent views of graphs, one contextual and two topological views.
We show that when coupled with metric-based meta-learning frameworks, the proposed encoder achieves the best average meta-test classification accuracy.
arXiv Detail & Related papers (2022-01-20T16:16:30Z) - CX-ToM: Counterfactual Explanations with Theory-of-Mind for Enhancing
Human Trust in Image Recognition Models [84.32751938563426]
We propose a new explainable AI (XAI) framework for explaining decisions made by a deep convolutional neural network (CNN)
In contrast to the current methods in XAI that generate explanations as a single shot response, we pose explanation as an iterative communication process.
Our framework generates sequence of explanations in a dialog by mediating the differences between the minds of machine and human user.
arXiv Detail & Related papers (2021-09-03T09:46:20Z) - This is not the Texture you are looking for! Introducing Novel
Counterfactual Explanations for Non-Experts using Generative Adversarial
Learning [59.17685450892182]
counterfactual explanation systems try to enable a counterfactual reasoning by modifying the input image.
We present a novel approach to generate such counterfactual image explanations based on adversarial image-to-image translation techniques.
Our results show that our approach leads to significantly better results regarding mental models, explanation satisfaction, trust, emotions, and self-efficacy than two state-of-the art systems.
arXiv Detail & Related papers (2020-12-22T10:08:05Z) - Weakly-Supervised Aspect-Based Sentiment Analysis via Joint
Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis.
We learn sentiment, aspect> joint topic embeddings in the word embedding space.
We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z) - A Revised Generative Evaluation of Visual Dialogue [80.17353102854405]
We propose a revised evaluation scheme for the VisDial dataset.
We measure consensus between answers generated by the model and a set of relevant answers.
We release these sets and code for the revised evaluation scheme as DenseVisDial.
arXiv Detail & Related papers (2020-04-20T13:26:45Z) - SHOP-VRB: A Visual Reasoning Benchmark for Object Perception [26.422761228628698]
We present an approach and a benchmark for visual reasoning in robotics applications.
We focus on inferring object properties from visual and text data.
We propose a reasoning system based on symbolic program execution.
arXiv Detail & Related papers (2020-04-06T13:46:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.