Towards a Unified Framework for Evaluating Explanations
- URL: http://arxiv.org/abs/2405.14016v2
- Date: Sun, 14 Jul 2024 01:11:22 GMT
- Title: Towards a Unified Framework for Evaluating Explanations
- Authors: Juan D. Pinto, Luc Paquette,
- Abstract summary: We argue that explanations serve as mediators between models and stakeholders, whether for intrinsically interpretable models or opaque black-box models.
We illustrate these criteria, as well as specific evaluation methods, using examples from an ongoing study of an interpretable neural network for predicting a particular learner behavior.
- Score: 0.6138671548064356
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The challenge of creating interpretable models has been taken up by two main research communities: ML researchers primarily focused on lower-level explainability methods that suit the needs of engineers, and HCI researchers who have more heavily emphasized user-centered approaches often based on participatory design methods. This paper reviews how these communities have evaluated interpretability, identifying overlaps and semantic misalignments. We propose moving towards a unified framework of evaluation criteria and lay the groundwork for such a framework by articulating the relationships between existing criteria. We argue that explanations serve as mediators between models and stakeholders, whether for intrinsically interpretable models or opaque black-box models analyzed via post-hoc techniques. We further argue that useful explanations require both faithfulness and intelligibility. Explanation plausibility is a prerequisite for intelligibility, while stability is a prerequisite for explanation faithfulness. We illustrate these criteria, as well as specific evaluation methods, using examples from an ongoing study of an interpretable neural network for predicting a particular learner behavior.
Related papers
- On the Fairness, Diversity and Reliability of Text-to-Image Generative Models [49.60774626839712]
multimodal generative models have sparked critical discussions on their fairness, reliability, and potential for misuse.
We propose an evaluation framework designed to assess model reliability through their responses to perturbations in the embedding space.
Our method lays the groundwork for detecting unreliable, bias-injected models and retrieval of bias provenance.
arXiv Detail & Related papers (2024-11-21T09:46:55Z) - Interpreting Inflammation Prediction Model via Tag-based Cohort Explanation [5.356481722174994]
We propose a novel framework for identifying cohorts within a dataset based on local feature importance scores.
We evaluate our framework on a food-based inflammation prediction model and demonstrated that the framework can generate reliable explanations that match domain knowledge.
arXiv Detail & Related papers (2024-10-17T23:22:59Z) - Exposing Assumptions in AI Benchmarks through Cognitive Modelling [0.0]
Cultural AI benchmarks often rely on implicit assumptions about measured constructs, leading to vague formulations with poor validity and unclear interrelations.
We propose exposing these assumptions using explicit cognitive models formulated as Structural Equation Models.
arXiv Detail & Related papers (2024-09-25T11:55:02Z) - Benchmarks as Microscopes: A Call for Model Metrology [76.64402390208576]
Modern language models (LMs) pose a new challenge in capability assessment.
To be confident in our metrics, we need a new discipline of model metrology.
arXiv Detail & Related papers (2024-07-22T17:52:12Z) - Relational Concept Bottleneck Models [13.311396882130033]
Concept Bottleneck Models (CBMs) are not designed to solve problems.
R-CBMs are capable of both representing standard CBMs and relational GNNs.
In particular, we show that R-CBMs support the generation of concept-based explanations.
arXiv Detail & Related papers (2023-08-23T08:25:33Z) - Are Neural Topic Models Broken? [81.15470302729638]
We study the relationship between automated and human evaluation of topic models.
We find that neural topic models fare worse in both respects compared to an established classical method.
arXiv Detail & Related papers (2022-10-28T14:38:50Z) - Logical Satisfiability of Counterfactuals for Faithful Explanations in
NLI [60.142926537264714]
We introduce the methodology of Faithfulness-through-Counterfactuals.
It generates a counterfactual hypothesis based on the logical predicates expressed in the explanation.
It then evaluates if the model's prediction on the counterfactual is consistent with that expressed logic.
arXiv Detail & Related papers (2022-05-25T03:40:59Z) - A Survey on the Robustness of Feature Importance and Counterfactual
Explanations [12.599872913953238]
We present a survey of the works that analysed the robustness of two classes of local explanations.
The survey aims to unify existing definitions of robustness, introduces a taxonomy to classify different robustness approaches, and discusses some interesting results.
arXiv Detail & Related papers (2021-10-30T22:48:04Z) - On the Faithfulness Measurements for Model Interpretations [100.2730234575114]
Post-hoc interpretations aim to uncover how natural language processing (NLP) models make predictions.
To tackle these issues, we start with three criteria: the removal-based criterion, the sensitivity of interpretations, and the stability of interpretations.
Motivated by the desideratum of these faithfulness notions, we introduce a new class of interpretation methods that adopt techniques from the adversarial domain.
arXiv Detail & Related papers (2021-04-18T09:19:44Z) - Evaluations and Methods for Explanation through Robustness Analysis [117.7235152610957]
We establish a novel set of evaluation criteria for such feature based explanations by analysis.
We obtain new explanations that are loosely necessary and sufficient for a prediction.
We extend the explanation to extract the set of features that would move the current prediction to a target class.
arXiv Detail & Related papers (2020-05-31T05:52:05Z) - Benchmarking Machine Reading Comprehension: A Psychological Perspective [45.85089157315507]
Machine reading comprehension (MRC) has received considerable attention as a benchmark for natural language understanding.
The conventional task design of MRC lacks explainability beyond the model interpretation.
This paper provides a theoretical basis for the design of MRC datasets based on psychology as well as psychometrics.
arXiv Detail & Related papers (2020-04-04T11:45:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.