Related papers: Evaluating Explanations for Reading Comprehension with Realistic Counterfactuals

Evaluating Explanations for Reading Comprehension with Realistic Counterfactuals

URL: http://arxiv.org/abs/2104.04515v1
Date: Fri, 9 Apr 2021 17:55:21 GMT
Title: Evaluating Explanations for Reading Comprehension with Realistic Counterfactuals
Authors: Xi Ye, Rohan Nair, Greg Durrett
Abstract summary: We propose a methodology to evaluate explanations for machine reading comprehension tasks. An explanation should allow us to understand the RC model's high-level behavior with respect to a set of realistic counterfactual input scenarios. Our analysis suggests that pairwise explanation techniques are better suited to RC than token-level attributions.
Score: 26.641834518599303
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Token-level attributions have been extensively studied to explain model predictions for a wide range of classification tasks in NLP (e.g., sentiment analysis), but such explanation techniques are less explored for machine reading comprehension (RC) tasks. Although the transformer-based models used here are identical to those used for classification, the underlying reasoning these models perform is very different and different types of explanations are required. We propose a methodology to evaluate explanations: an explanation should allow us to understand the RC model's high-level behavior with respect to a set of realistic counterfactual input scenarios. We define these counterfactuals for several RC settings, and by connecting explanation techniques' outputs to high-level model behavior, we can evaluate how useful different explanations really are. Our analysis suggests that pairwise explanation techniques are better suited to RC than token-level attributions, which are often unfaithful in the scenarios we consider. We additionally propose an improvement to an attention-based attribution technique, resulting in explanations which better reveal the model's behavior.

Related papers

From latent factors to language: a user study on LLM-generated explanations for an inherently interpretable matrix-based recommender system [8.280161440212504]
We investigate whether large language models (LLMs) can generate effective, user-facing explanations from a mathematically interpretable recommendation model.<n>We conduct a study with 326 participants who assessed the quality of the explanations across five key dimensions.<n>Our analysis reveals that all explanation types are generally well received, with moderate statistical differences between strategies.
arXiv Detail & Related papers (2025-09-23T13:30:03Z)
Selective Explanations [14.312717332216073]
A machine learning model is trained to predict feature attribution scores with only one inference. Despite their efficiency, amortized explainers can produce inaccurate predictions and misleading explanations. We propose selective explanations, a novel feature attribution method that detects when amortized explainers generate low-quality explanations.
arXiv Detail & Related papers (2024-05-29T23:08:31Z)
Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers. We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models. Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z)
Counterfactuals of Counterfactuals: a back-translation-inspired approach to analyse counterfactual editors [3.4253416336476246]
We focus on the analysis of counterfactual, contrastive explanations. We propose a new back translation-inspired evaluation methodology. We show that by iteratively feeding the counterfactual to the explainer we can obtain valuable insights into the behaviour of both the predictor and the explainer models.
arXiv Detail & Related papers (2023-05-26T16:04:28Z)
Learning with Explanation Constraints [91.23736536228485]
We provide a learning theoretic framework to analyze how explanations can improve the learning of our models. We demonstrate the benefits of our approach over a large array of synthetic and real-world experiments.
arXiv Detail & Related papers (2023-03-25T15:06:47Z)
Understanding Post-hoc Explainers: The Case of Anchors [6.681943980068051]
We present a theoretical analysis of a rule-based interpretability method that highlights a small set of words to explain a text's decision. After formalizing its algorithm and providing useful insights, we demonstrate mathematically that Anchors produces meaningful results.
arXiv Detail & Related papers (2023-03-15T17:56:34Z)
Streamlining models with explanations in the learning loop [0.0]
Several explainable AI methods allow a Machine Learning user to get insights on the classification process of a black-box model. We exploit this information to design a feature engineering phase, where we combine explanations with feature values.
arXiv Detail & Related papers (2023-02-15T16:08:32Z)
Explainability in Process Outcome Prediction: Guidelines to Obtain Interpretable and Faithful Models [77.34726150561087]
We define explainability through the interpretability of the explanations and the faithfulness of the explainability model in the field of process outcome prediction. This paper contributes a set of guidelines named X-MOP which allows selecting the appropriate model based on the event log specifications.
arXiv Detail & Related papers (2022-03-30T05:59:50Z)
Beyond Trivial Counterfactual Explanations with Diverse Valuable Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction. We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss. Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z)
A Diagnostic Study of Explainability Techniques for Text Classification [52.879658637466605]
We develop a list of diagnostic properties for evaluating existing explainability techniques. We compare the saliency scores assigned by the explainability techniques with human annotations of salient input regions to find relations between a model's performance and the agreement of its rationales with human ones.
arXiv Detail & Related papers (2020-09-25T12:01:53Z)
Evaluations and Methods for Explanation through Robustness Analysis [117.7235152610957]
We establish a novel set of evaluation criteria for such feature based explanations by analysis. We obtain new explanations that are loosely necessary and sufficient for a prediction. We extend the explanation to extract the set of features that would move the current prediction to a target class.
arXiv Detail & Related papers (2020-05-31T05:52:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.