Evaluating Explanations for Reading Comprehension with Realistic
Counterfactuals
- URL: http://arxiv.org/abs/2104.04515v1
- Date: Fri, 9 Apr 2021 17:55:21 GMT
- Title: Evaluating Explanations for Reading Comprehension with Realistic
Counterfactuals
- Authors: Xi Ye, Rohan Nair, Greg Durrett
- Abstract summary: We propose a methodology to evaluate explanations for machine reading comprehension tasks.
An explanation should allow us to understand the RC model's high-level behavior with respect to a set of realistic counterfactual input scenarios.
Our analysis suggests that pairwise explanation techniques are better suited to RC than token-level attributions.
- Score: 26.641834518599303
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Token-level attributions have been extensively studied to explain model
predictions for a wide range of classification tasks in NLP (e.g., sentiment
analysis), but such explanation techniques are less explored for machine
reading comprehension (RC) tasks. Although the transformer-based models used
here are identical to those used for classification, the underlying reasoning
these models perform is very different and different types of explanations are
required. We propose a methodology to evaluate explanations: an explanation
should allow us to understand the RC model's high-level behavior with respect
to a set of realistic counterfactual input scenarios. We define these
counterfactuals for several RC settings, and by connecting explanation
techniques' outputs to high-level model behavior, we can evaluate how useful
different explanations really are. Our analysis suggests that pairwise
explanation techniques are better suited to RC than token-level attributions,
which are often unfaithful in the scenarios we consider. We additionally
propose an improvement to an attention-based attribution technique, resulting
in explanations which better reveal the model's behavior.
Related papers
- Selective Explanations [14.312717332216073]
A machine learning model is trained to predict feature attribution scores with only one inference.
Despite their efficiency, amortized explainers can produce inaccurate predictions and misleading explanations.
We propose selective explanations, a novel feature attribution method that detects when amortized explainers generate low-quality explanations.
arXiv Detail & Related papers (2024-05-29T23:08:31Z) - Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers.
We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models.
Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z) - Counterfactuals of Counterfactuals: a back-translation-inspired approach
to analyse counterfactual editors [3.4253416336476246]
We focus on the analysis of counterfactual, contrastive explanations.
We propose a new back translation-inspired evaluation methodology.
We show that by iteratively feeding the counterfactual to the explainer we can obtain valuable insights into the behaviour of both the predictor and the explainer models.
arXiv Detail & Related papers (2023-05-26T16:04:28Z) - Learning with Explanation Constraints [91.23736536228485]
We provide a learning theoretic framework to analyze how explanations can improve the learning of our models.
We demonstrate the benefits of our approach over a large array of synthetic and real-world experiments.
arXiv Detail & Related papers (2023-03-25T15:06:47Z) - Understanding Post-hoc Explainers: The Case of Anchors [6.681943980068051]
We present a theoretical analysis of a rule-based interpretability method that highlights a small set of words to explain a text's decision.
After formalizing its algorithm and providing useful insights, we demonstrate mathematically that Anchors produces meaningful results.
arXiv Detail & Related papers (2023-03-15T17:56:34Z) - Streamlining models with explanations in the learning loop [0.0]
Several explainable AI methods allow a Machine Learning user to get insights on the classification process of a black-box model.
We exploit this information to design a feature engineering phase, where we combine explanations with feature values.
arXiv Detail & Related papers (2023-02-15T16:08:32Z) - Explainability in Process Outcome Prediction: Guidelines to Obtain
Interpretable and Faithful Models [77.34726150561087]
We define explainability through the interpretability of the explanations and the faithfulness of the explainability model in the field of process outcome prediction.
This paper contributes a set of guidelines named X-MOP which allows selecting the appropriate model based on the event log specifications.
arXiv Detail & Related papers (2022-03-30T05:59:50Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - A Diagnostic Study of Explainability Techniques for Text Classification [52.879658637466605]
We develop a list of diagnostic properties for evaluating existing explainability techniques.
We compare the saliency scores assigned by the explainability techniques with human annotations of salient input regions to find relations between a model's performance and the agreement of its rationales with human ones.
arXiv Detail & Related papers (2020-09-25T12:01:53Z) - Evaluations and Methods for Explanation through Robustness Analysis [117.7235152610957]
We establish a novel set of evaluation criteria for such feature based explanations by analysis.
We obtain new explanations that are loosely necessary and sufficient for a prediction.
We extend the explanation to extract the set of features that would move the current prediction to a target class.
arXiv Detail & Related papers (2020-05-31T05:52:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.