CX-ToM: Counterfactual Explanations with Theory-of-Mind for Enhancing
Human Trust in Image Recognition Models
- URL: http://arxiv.org/abs/2109.01401v2
- Date: Mon, 6 Sep 2021 07:00:34 GMT
- Title: CX-ToM: Counterfactual Explanations with Theory-of-Mind for Enhancing
Human Trust in Image Recognition Models
- Authors: Arjun R. Akula, Keze Wang, Changsong Liu, Sari Saba-Sadiya, Hongjing
Lu, Sinisa Todorovic, Joyce Chai, and Song-Chun Zhu
- Abstract summary: We propose a new explainable AI (XAI) framework for explaining decisions made by a deep convolutional neural network (CNN)
In contrast to the current methods in XAI that generate explanations as a single shot response, we pose explanation as an iterative communication process.
Our framework generates sequence of explanations in a dialog by mediating the differences between the minds of machine and human user.
- Score: 84.32751938563426
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose CX-ToM, short for counterfactual explanations with theory-of mind,
a new explainable AI (XAI) framework for explaining decisions made by a deep
convolutional neural network (CNN). In contrast to the current methods in XAI
that generate explanations as a single shot response, we pose explanation as an
iterative communication process, i.e. dialog, between the machine and human
user. More concretely, our CX-ToM framework generates sequence of explanations
in a dialog by mediating the differences between the minds of machine and human
user. To do this, we use Theory of Mind (ToM) which helps us in explicitly
modeling human's intention, machine's mind as inferred by the human as well as
human's mind as inferred by the machine. Moreover, most state-of-the-art XAI
frameworks provide attention (or heat map) based explanations. In our work, we
show that these attention based explanations are not sufficient for increasing
human trust in the underlying CNN model. In CX-ToM, we instead use
counterfactual explanations called fault-lines which we define as follows:
given an input image I for which a CNN classification model M predicts class
c_pred, a fault-line identifies the minimal semantic-level features (e.g.,
stripes on zebra, pointed ears of dog), referred to as explainable concepts,
that need to be added to or deleted from I in order to alter the classification
category of I by M to another specified class c_alt. We argue that, due to the
iterative, conceptual and counterfactual nature of CX-ToM explanations, our
framework is practical and more natural for both expert and non-expert users to
understand the internal workings of complex deep learning models. Extensive
quantitative and qualitative experiments verify our hypotheses, demonstrating
that our CX-ToM significantly outperforms the state-of-the-art explainable AI
models.
Related papers
- Reasoning with trees: interpreting CNNs using hierarchies [3.6763102409647526]
We introduce a framework that uses hierarchical segmentation techniques for faithful and interpretable explanations of Convolutional Neural Networks (CNNs)
Our method constructs model-based hierarchical segmentations that maintain the model's reasoning fidelity.
Experiments show that our framework, xAiTrees, delivers highly interpretable and faithful model explanations.
arXiv Detail & Related papers (2024-06-19T06:45:19Z) - Less is More: Discovering Concise Network Explanations [26.126343100127936]
We introduce Discovering Conceptual Network Explanations (DCNE), a new approach for generating human-comprehensible visual explanations.
Our method automatically finds visual explanations that are critical for discriminating between classes.
DCNE represents a step forward in making neural network decisions accessible and interpretable to humans.
arXiv Detail & Related papers (2024-05-24T06:10:23Z) - Evaluating the Utility of Model Explanations for Model Development [54.23538543168767]
We evaluate whether explanations can improve human decision-making in practical scenarios of machine learning model development.
To our surprise, we did not find evidence of significant improvement on tasks when users were provided with any of the saliency maps.
These findings suggest caution regarding the usefulness and potential for misunderstanding in saliency-based explanations.
arXiv Detail & Related papers (2023-12-10T23:13:23Z) - The future of human-centric eXplainable Artificial Intelligence (XAI) is not post-hoc explanations [3.7673721058583123]
We propose a shift from post-hoc explainability to designing interpretable neural network architectures.
We identify five needs of human-centric XAI and propose two schemes for interpretable-by-design neural network.
arXiv Detail & Related papers (2023-07-01T15:24:47Z) - Explaining Explainability: Towards Deeper Actionable Insights into Deep
Learning through Second-order Explainability [70.60433013657693]
Second-order explainable AI (SOXAI) was recently proposed to extend explainable AI (XAI) from the instance level to the dataset level.
We demonstrate for the first time, via example classification and segmentation cases, that eliminating irrelevant concepts from the training set based on actionable insights from SOXAI can enhance a model's performance.
arXiv Detail & Related papers (2023-06-14T23:24:01Z) - Motif-guided Time Series Counterfactual Explanations [1.1510009152620664]
We propose a novel model that generates intuitive post-hoc counterfactual explanations.
We validated our model using five real-world time-series datasets from the UCR repository.
arXiv Detail & Related papers (2022-11-08T17:56:50Z) - Learning Theory of Mind via Dynamic Traits Attribution [59.9781556714202]
We propose a new neural ToM architecture that learns to generate a latent trait vector of an actor from the past trajectories.
This trait vector then multiplicatively modulates the prediction mechanism via a fast weights' scheme in the prediction neural network.
We empirically show that the fast weights provide a good inductive bias to model the character traits of agents and hence improves mindreading ability.
arXiv Detail & Related papers (2022-04-17T11:21:18Z) - Explanation as a process: user-centric construction of multi-level and
multi-modal explanations [0.34410212782758043]
We present a process-based approach that combines multi-level and multi-modal explanations.
We use Inductive Logic Programming, an interpretable machine learning approach, to learn a comprehensible model.
arXiv Detail & Related papers (2021-10-07T19:26:21Z) - This is not the Texture you are looking for! Introducing Novel
Counterfactual Explanations for Non-Experts using Generative Adversarial
Learning [59.17685450892182]
counterfactual explanation systems try to enable a counterfactual reasoning by modifying the input image.
We present a novel approach to generate such counterfactual image explanations based on adversarial image-to-image translation techniques.
Our results show that our approach leads to significantly better results regarding mental models, explanation satisfaction, trust, emotions, and self-efficacy than two state-of-the art systems.
arXiv Detail & Related papers (2020-12-22T10:08:05Z) - Explainability in Deep Reinforcement Learning [68.8204255655161]
We review recent works in the direction to attain Explainable Reinforcement Learning (XRL)
In critical situations where it is essential to justify and explain the agent's behaviour, better explainability and interpretability of RL models could help gain scientific insight on the inner workings of what is still considered a black box.
arXiv Detail & Related papers (2020-08-15T10:11:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.