REX: Reasoning-aware and Grounded Explanation
        - URL: http://arxiv.org/abs/2203.06107v1
- Date: Fri, 11 Mar 2022 17:28:42 GMT
- Title: REX: Reasoning-aware and Grounded Explanation
- Authors: Shi Chen and Qi Zhao
- Abstract summary: We develop a new type of multi-modal explanations that explain the decisions by traversing the reasoning process and grounding keywords in the images.
Second, we identify the critical need to tightly couple important components across the visual and textual modalities for explaining the decisions.
Third, we propose a novel explanation generation method that explicitly models the pairwise correspondence between words and regions of interest.
- Score: 30.392986232906107
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract:   Effectiveness and interpretability are two essential properties for
trustworthy AI systems. Most recent studies in visual reasoning are dedicated
to improving the accuracy of predicted answers, and less attention is paid to
explaining the rationales behind the decisions. As a result, they commonly take
advantage of spurious biases instead of actually reasoning on the
visual-textual data, and have yet developed the capability to explain their
decision making by considering key information from both modalities. This paper
aims to close the gap from three distinct perspectives: first, we define a new
type of multi-modal explanations that explain the decisions by progressively
traversing the reasoning process and grounding keywords in the images. We
develop a functional program to sequentially execute different reasoning steps
and construct a new dataset with 1,040,830 multi-modal explanations. Second, we
identify the critical need to tightly couple important components across the
visual and textual modalities for explaining the decisions, and propose a novel
explanation generation method that explicitly models the pairwise
correspondence between words and regions of interest. It improves the visual
grounding capability by a considerable margin, resulting in enhanced
interpretability and reasoning performance. Finally, with our new data and
method, we perform extensive analyses to study the effectiveness of our
explanation under different settings, including multi-task learning and
transfer learning. Our code and data are available at
https://github.com/szzexpoi/rex.
 
      
        Related papers
        - Explain with Visual Keypoints Like a Real Mentor! A Benchmark for   Multimodal Solution Explanation [19.4261670152456]
 We introduce a novel task of visual solution explanation, which requires generating explanations that incorporate newly introduced visual elements essential for understanding.
We propose MathExplain, a benchmark consisting of 997 math problems annotated with visual keypoints and corresponding explanatory text that references those elements.
Our empirical results show that while some closed-source models demonstrate promising capabilities on visual solution-explaining, current open-source general-purpose models perform inconsistently.
 arXiv  Detail & Related papers  (2025-04-04T06:03:13Z)
- MEGL: Multimodal Explanation-Guided Learning [23.54169888224728]
 We propose a novel Multimodal Explanation-Guided Learning (MEGL) framework to enhance model interpretability and improve classification performance.
Our Saliency-Driven Textual Grounding (SDTG) approach integrates spatial information from visual explanations into textual rationales, providing spatially grounded and contextually rich explanations.
We validate MEGL on two new datasets, Object-ME and Action-ME, for image classification with multimodal explanations.
 arXiv  Detail & Related papers  (2024-11-20T05:57:00Z)
- Explainability for Machine Learning Models: From Data Adaptability to
  User Perception [0.8702432681310401]
 This thesis explores the generation of local explanations for already deployed machine learning models.
It aims to identify optimal conditions for producing meaningful explanations considering both data and user requirements.
 arXiv  Detail & Related papers  (2024-02-16T18:44:37Z)
- Visual Commonsense based Heterogeneous Graph Contrastive Learning [79.22206720896664]
 We propose a heterogeneous graph contrastive learning method to better finish the visual reasoning task.
Our method is designed as a plug-and-play way, so that it can be quickly and easily combined with a wide range of representative methods.
 arXiv  Detail & Related papers  (2023-11-11T12:01:18Z)
- See, Think, Confirm: Interactive Prompting Between Vision and Language
  Models for Knowledge-based Visual Reasoning [60.43585179885355]
 We propose a novel framework named Interactive Prompting Visual Reasoner (IPVR) for few-shot knowledge-based visual reasoning.
IPVR contains three stages, see, think and confirm.
We conduct experiments on a range of knowledge-based visual reasoning datasets.
 arXiv  Detail & Related papers  (2023-01-12T18:59:50Z)
- Complementary Explanations for Effective In-Context Learning [77.83124315634386]
 Large language models (LLMs) have exhibited remarkable capabilities in learning from explanations in prompts.
This work aims to better understand the mechanisms by which explanations are used for in-context learning.
 arXiv  Detail & Related papers  (2022-11-25T04:40:47Z)
- Textual Explanations and Critiques in Recommendation Systems [8.406549970145846]
 dissertation focuses on two fundamental challenges of addressing this need.
The first involves explanation generation in a scalable and data-driven manner.
The second challenge consists in making explanations actionable, and we refer to it as critiquing.
 arXiv  Detail & Related papers  (2022-05-15T11:59:23Z)
- Human Interpretation of Saliency-based Explanation Over Text [65.29015910991261]
 We study saliency-based explanations over textual data.
We find that people often mis-interpret the explanations.
We propose a method to adjust saliencies based on model estimates of over- and under-perception.
 arXiv  Detail & Related papers  (2022-01-27T15:20:32Z)
- A First Look: Towards Explainable TextVQA Models via Visual and Textual
  Explanations [3.7638008383533856]
 We propose MTXNet, an end-to-end trainable multimodal architecture to generate multimodal explanations.
We show that training with multimodal explanations surpasses unimodal baselines by up to 7% in CIDEr scores and 2% in IoU.
We also describe a real-world e-commerce application for using the generated multimodal explanations.
 arXiv  Detail & Related papers  (2021-04-29T00:36:17Z)
- Contrastive Explanations for Model Interpretability [77.92370750072831]
 We propose a methodology to produce contrastive explanations for classification models.
Our method is based on projecting model representation to a latent space.
Our findings shed light on the ability of label-contrastive explanations to provide a more accurate and finer-grained interpretability of a model's decision.
 arXiv  Detail & Related papers  (2021-03-02T00:36:45Z)
- This is not the Texture you are looking for! Introducing Novel
  Counterfactual Explanations for Non-Experts using Generative Adversarial
  Learning [59.17685450892182]
 counterfactual explanation systems try to enable a counterfactual reasoning by modifying the input image.
We present a novel approach to generate such counterfactual image explanations based on adversarial image-to-image translation techniques.
Our results show that our approach leads to significantly better results regarding mental models, explanation satisfaction, trust, emotions, and self-efficacy than two state-of-the art systems.
 arXiv  Detail & Related papers  (2020-12-22T10:08:05Z)
- Generating Hierarchical Explanations on Text Classification via Feature
  Interaction Detection [21.02924712220406]
 We build hierarchical explanations by detecting feature interactions.
Such explanations visualize how words and phrases are combined at different levels of the hierarchy.
 Experiments show the effectiveness of the proposed method in providing explanations both faithful to models and interpretable to humans.
 arXiv  Detail & Related papers  (2020-04-04T20:56:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.