On Sample Based Explanation Methods for NLP:Efficiency, Faithfulness,
and Semantic Evaluation
- URL: http://arxiv.org/abs/2106.04753v1
- Date: Wed, 9 Jun 2021 00:49:56 GMT
- Title: On Sample Based Explanation Methods for NLP:Efficiency, Faithfulness,
and Semantic Evaluation
- Authors: Wei Zhang, Ziming Huang, Yada Zhu, Guangnan Ye, Xiaodong Cui, Fan
Zhang
- Abstract summary: We can improve the interpretability of explanations by allowing arbitrary text sequences as the explanation unit.
We propose a semantic-based evaluation metric that can better align with humans' judgment of explanations.
- Score: 23.72825603188359
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the recent advances of natural language processing, the scale of the
state-of-the-art models and datasets is usually extensive, which challenges the
application of sample-based explanation methods in many aspects, such as
explanation interpretability, efficiency, and faithfulness. In this work, for
the first time, we can improve the interpretability of explanations by allowing
arbitrary text sequences as the explanation unit. On top of this, we implement
a hessian-free method with a model faithfulness guarantee. Finally, to compare
our method with the others, we propose a semantic-based evaluation metric that
can better align with humans' judgment of explanations than the widely adopted
diagnostic or re-training measures. The empirical results on multiple real data
sets demonstrate the proposed method's superior performance to popular
explanation techniques such as Influence Function or TracIn on semantic
evaluation.
Related papers
- Exploring the Trade-off Between Model Performance and Explanation Plausibility of Text Classifiers Using Human Rationales [3.242050660144211]
Saliency post-hoc explainability methods are important tools for understanding increasingly complex NLP models.
We present a methodology for incorporating rationales, which are text annotations explaining human decisions, into text classification models.
arXiv Detail & Related papers (2024-04-03T22:39:33Z) - Revisiting Demonstration Selection Strategies in In-Context Learning [66.11652803887284]
Large language models (LLMs) have shown an impressive ability to perform a wide range of tasks using in-context learning (ICL)
In this work, we first revisit the factors contributing to this variance from both data and model aspects, and find that the choice of demonstration is both data- and model-dependent.
We propose a data- and model-dependent demonstration selection method, textbfTopK + ConE, based on the assumption that textitthe performance of a demonstration positively correlates with its contribution to the model's understanding of the test samples.
arXiv Detail & Related papers (2024-01-22T16:25:27Z) - Ensemble Interpretation: A Unified Method for Interpretable Machine
Learning [1.276129213205911]
A novel interpretable methodology, ensemble interpretation, is presented in this paper.
Experiment results show that the ensemble interpretation is more stable and more consistent with human experience and cognition.
As an application, we use the ensemble interpretation for feature selection, and then the generalization performance of the corresponding learning model is significantly improved.
arXiv Detail & Related papers (2023-12-11T09:51:24Z) - Counterfactuals of Counterfactuals: a back-translation-inspired approach
to analyse counterfactual editors [3.4253416336476246]
We focus on the analysis of counterfactual, contrastive explanations.
We propose a new back translation-inspired evaluation methodology.
We show that by iteratively feeding the counterfactual to the explainer we can obtain valuable insights into the behaviour of both the predictor and the explainer models.
arXiv Detail & Related papers (2023-05-26T16:04:28Z) - MACE: An Efficient Model-Agnostic Framework for Counterfactual
Explanation [132.77005365032468]
We propose a novel framework of Model-Agnostic Counterfactual Explanation (MACE)
In our MACE approach, we propose a novel RL-based method for finding good counterfactual examples and a gradient-less descent method for improving proximity.
Experiments on public datasets validate the effectiveness with better validity, sparsity and proximity.
arXiv Detail & Related papers (2022-05-31T04:57:06Z) - A Multilingual Perspective Towards the Evaluation of Attribution Methods
in Natural Language Inference [28.949004915740776]
We present a multilingual approach for evaluating attribution methods for the Natural Language Inference (NLI) task in terms of faithfulness and plausibility.
First, we introduce a novel cross-lingual strategy to measure faithfulness based on word alignments, which eliminates the drawbacks of erasure-based evaluations.
We then perform a comprehensive evaluation of attribution methods, considering different output mechanisms and aggregation methods.
arXiv Detail & Related papers (2022-04-11T22:11:05Z) - Discriminative Attribution from Counterfactuals [64.94009515033984]
We present a method for neural network interpretability by combining feature attribution with counterfactual explanations.
We show that this method can be used to quantitatively evaluate the performance of feature attribution methods in an objective manner.
arXiv Detail & Related papers (2021-09-28T00:53:34Z) - Obtaining Better Static Word Embeddings Using Contextual Embedding
Models [53.86080627007695]
Our proposed distillation method is a simple extension of CBOW-based training.
As a side-effect, our approach also allows a fair comparison of both contextual and static embeddings.
arXiv Detail & Related papers (2021-06-08T12:59:32Z) - A Diagnostic Study of Explainability Techniques for Text Classification [52.879658637466605]
We develop a list of diagnostic properties for evaluating existing explainability techniques.
We compare the saliency scores assigned by the explainability techniques with human annotations of salient input regions to find relations between a model's performance and the agreement of its rationales with human ones.
arXiv Detail & Related papers (2020-09-25T12:01:53Z) - Evaluating Explainable AI: Which Algorithmic Explanations Help Users
Predict Model Behavior? [97.77183117452235]
We carry out human subject tests to isolate the effect of algorithmic explanations on model interpretability.
Clear evidence of method effectiveness is found in very few cases.
Our results provide the first reliable and comprehensive estimates of how explanations influence simulatability.
arXiv Detail & Related papers (2020-05-04T20:35:17Z) - The Explanation Game: Towards Prediction Explainability through Sparse
Communication [6.497816402045099]
We provide a unified perspective of explainability as a problem between an explainer and a layperson.
We use this framework to compare several prior approaches for extracting explanations.
We propose new embedded methods for explainability, through the use of selective, sparse attention.
arXiv Detail & Related papers (2020-04-28T22:27:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.