On the Faithfulness Measurements for Model Interpretations
- URL: http://arxiv.org/abs/2104.08782v1
- Date: Sun, 18 Apr 2021 09:19:44 GMT
- Title: On the Faithfulness Measurements for Model Interpretations
- Authors: Fan Yin, Zhouxing Shi, Cho-Jui Hsieh, Kai-Wei Chang
- Abstract summary: Post-hoc interpretations aim to uncover how natural language processing (NLP) models make predictions.
To tackle these issues, we start with three criteria: the removal-based criterion, the sensitivity of interpretations, and the stability of interpretations.
Motivated by the desideratum of these faithfulness notions, we introduce a new class of interpretation methods that adopt techniques from the adversarial domain.
- Score: 100.2730234575114
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent years have witnessed the emergence of a variety of post-hoc
interpretations that aim to uncover how natural language processing (NLP)
models make predictions. Despite the surge of new interpretations, it remains
an open problem how to define and quantitatively measure the faithfulness of
interpretations, i.e., to what extent they conform to the reasoning process
behind the model. To tackle these issues, we start with three criteria: the
removal-based criterion, the sensitivity of interpretations, and the stability
of interpretations, that quantify different notions of faithfulness, and
propose novel paradigms to systematically evaluate interpretations in NLP. Our
results show that the performance of interpretations under different criteria
of faithfulness could vary substantially. Motivated by the desideratum of these
faithfulness notions, we introduce a new class of interpretation methods that
adopt techniques from the adversarial robustness domain. Empirical results show
that our proposed methods achieve top performance under all three criteria.
Along with experiments and analysis on both the text classification and the
dependency parsing tasks, we come to a more comprehensive understanding of the
diverse set of interpretations.
Related papers
- Ensemble Interpretation: A Unified Method for Interpretable Machine
Learning [1.276129213205911]
A novel interpretable methodology, ensemble interpretation, is presented in this paper.
Experiment results show that the ensemble interpretation is more stable and more consistent with human experience and cognition.
As an application, we use the ensemble interpretation for feature selection, and then the generalization performance of the corresponding learning model is significantly improved.
arXiv Detail & Related papers (2023-12-11T09:51:24Z) - Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks.
The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation.
We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z) - Natural Language Decompositions of Implicit Content Enable Better Text
Representations [56.85319224208865]
We introduce a method for the analysis of text that takes implicitly communicated content explicitly into account.
We use a large language model to produce sets of propositions that are inferentially related to the text that has been observed.
Our results suggest that modeling the meanings behind observed language, rather than the literal text alone, is a valuable direction for NLP.
arXiv Detail & Related papers (2023-05-23T23:45:20Z) - FICNN: A Framework for the Interpretation of Deep Convolutional Neural
Networks [0.0]
The aim of this paper is to propose a framework for the study of interpretation methods designed for CNN models trained from visual data.
Our framework highlights that just a very small amount of the suggested factors, and combinations thereof, have been actually studied.
arXiv Detail & Related papers (2023-05-17T10:59:55Z) - A Fine-grained Interpretability Evaluation Benchmark for Neural NLP [44.08113828762984]
This benchmark covers three representative NLP tasks: sentiment analysis, textual similarity and reading comprehension.
We provide token-level rationales that are carefully annotated to be sufficient, compact and comprehensive.
We conduct experiments on three typical models with three saliency methods, and unveil their strengths and weakness in terms of interpretability.
arXiv Detail & Related papers (2022-05-23T07:37:04Z) - Evaluating Saliency Methods for Neural Language Models [9.309351023703018]
Saliency methods are widely used to interpret neural network predictions.
Different variants of saliency methods disagree even on the interpretations of the same prediction made by the same model.
We conduct a comprehensive and quantitative evaluation of saliency methods on a fundamental category of NLP models: neural language models.
arXiv Detail & Related papers (2021-04-12T21:19:48Z) - Interpretable Deep Learning: Interpretations, Interpretability,
Trustworthiness, and Beyond [49.93153180169685]
We introduce and clarify two basic concepts-interpretations and interpretability-that people usually get confused.
We elaborate the design of several recent interpretation algorithms, from different perspectives, through proposing a new taxonomy.
We summarize the existing work in evaluating models' interpretability using "trustworthy" interpretation algorithms.
arXiv Detail & Related papers (2021-03-19T08:40:30Z) - Are Interpretations Fairly Evaluated? A Definition Driven Pipeline for
Post-Hoc Interpretability [54.85658598523915]
We propose to have a concrete definition of interpretation before we could evaluate faithfulness of an interpretation.
We find that although interpretation methods perform differently under a certain evaluation metric, such a difference may not result from interpretation quality or faithfulness.
arXiv Detail & Related papers (2020-09-16T06:38:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.