A Fine-grained Interpretability Evaluation Benchmark for Neural NLP
- URL: http://arxiv.org/abs/2205.11097v1
- Date: Mon, 23 May 2022 07:37:04 GMT
- Title: A Fine-grained Interpretability Evaluation Benchmark for Neural NLP
- Authors: Lijie Wang, Yaozong Shen, Shuyuan Peng, Shuai Zhang, Xinyan Xiao, Hao
Liu, Hongxuan Tang, Ying Chen, Hua Wu, Haifeng Wang
- Abstract summary: This benchmark covers three representative NLP tasks: sentiment analysis, textual similarity and reading comprehension.
We provide token-level rationales that are carefully annotated to be sufficient, compact and comprehensive.
We conduct experiments on three typical models with three saliency methods, and unveil their strengths and weakness in terms of interpretability.
- Score: 44.08113828762984
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: While there is increasing concern about the interpretability of neural
models, the evaluation of interpretability remains an open problem, due to the
lack of proper evaluation datasets and metrics. In this paper, we present a
novel benchmark to evaluate the interpretability of both neural models and
saliency methods. This benchmark covers three representative NLP tasks:
sentiment analysis, textual similarity and reading comprehension, each provided
with both English and Chinese annotated data. In order to precisely evaluate
the interpretability, we provide token-level rationales that are carefully
annotated to be sufficient, compact and comprehensive. We also design a new
metric, i.e., the consistency between the rationales before and after
perturbations, to uniformly evaluate the interpretability of models and
saliency methods on different tasks. Based on this benchmark, we conduct
experiments on three typical models with three saliency methods, and unveil
their strengths and weakness in terms of interpretability. We will release this
benchmark at \url{https://xyz} and hope it can facilitate the research in
building trustworthy systems.
Related papers
- InterpretCC: Intrinsic User-Centric Interpretability through Global Mixture of Experts [31.738009841932374]
Interpretability for neural networks is a trade-off between three key requirements.
We present InterpretCC, a family of interpretable-by-design neural networks that guarantee human-centric interpretability.
arXiv Detail & Related papers (2024-02-05T11:55:50Z) - How Well Do Text Embedding Models Understand Syntax? [50.440590035493074]
The ability of text embedding models to generalize across a wide range of syntactic contexts remains under-explored.
Our findings reveal that existing text embedding models have not sufficiently addressed these syntactic understanding challenges.
We propose strategies to augment the generalization ability of text embedding models in diverse syntactic scenarios.
arXiv Detail & Related papers (2023-11-14T08:51:00Z) - Neural Causal Models for Counterfactual Identification and Estimation [62.30444687707919]
We study the evaluation of counterfactual statements through neural models.
First, we show that neural causal models (NCMs) are expressive enough.
Second, we develop an algorithm for simultaneously identifying and estimating counterfactual distributions.
arXiv Detail & Related papers (2022-09-30T18:29:09Z) - InterpretTime: a new approach for the systematic evaluation of
neural-network interpretability in time series classification [0.0]
We present a novel approach to evaluate the performance of interpretability methods for time series classification.
We propose a new strategy to assess the similarity between domain experts and machine data interpretation.
arXiv Detail & Related papers (2022-02-11T14:55:56Z) - Beyond the Tip of the Iceberg: Assessing Coherence of Text Classifiers [0.05857406612420462]
Large-scale, pre-trained language models achieve human-level and superhuman accuracy on existing language understanding tasks.
We propose evaluating systems through a novel measure of prediction coherence.
arXiv Detail & Related papers (2021-09-10T15:04:23Z) - On the Faithfulness Measurements for Model Interpretations [100.2730234575114]
Post-hoc interpretations aim to uncover how natural language processing (NLP) models make predictions.
To tackle these issues, we start with three criteria: the removal-based criterion, the sensitivity of interpretations, and the stability of interpretations.
Motivated by the desideratum of these faithfulness notions, we introduce a new class of interpretation methods that adopt techniques from the adversarial domain.
arXiv Detail & Related papers (2021-04-18T09:19:44Z) - Evaluating Saliency Methods for Neural Language Models [9.309351023703018]
Saliency methods are widely used to interpret neural network predictions.
Different variants of saliency methods disagree even on the interpretations of the same prediction made by the same model.
We conduct a comprehensive and quantitative evaluation of saliency methods on a fundamental category of NLP models: neural language models.
arXiv Detail & Related papers (2021-04-12T21:19:48Z) - Interpretable Deep Learning: Interpretations, Interpretability,
Trustworthiness, and Beyond [49.93153180169685]
We introduce and clarify two basic concepts-interpretations and interpretability-that people usually get confused.
We elaborate the design of several recent interpretation algorithms, from different perspectives, through proposing a new taxonomy.
We summarize the existing work in evaluating models' interpretability using "trustworthy" interpretation algorithms.
arXiv Detail & Related papers (2021-03-19T08:40:30Z) - GO FIGURE: A Meta Evaluation of Factuality in Summarization [131.1087461486504]
We introduce GO FIGURE, a meta-evaluation framework for evaluating factuality evaluation metrics.
Our benchmark analysis on ten factuality metrics reveals that our framework provides a robust and efficient evaluation.
It also reveals that while QA metrics generally improve over standard metrics that measure factuality across domains, performance is highly dependent on the way in which questions are generated.
arXiv Detail & Related papers (2020-10-24T08:30:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.