Related papers: Towards a Unified Framework for Evaluating Explanations

Towards a Unified Framework for Evaluating Explanations

URL: http://arxiv.org/abs/2405.14016v2
Date: Sun, 14 Jul 2024 01:11:22 GMT
Title: Towards a Unified Framework for Evaluating Explanations
Authors: Juan D. Pinto, Luc Paquette,
Abstract summary: We argue that explanations serve as mediators between models and stakeholders, whether for intrinsically interpretable models or opaque black-box models. We illustrate these criteria, as well as specific evaluation methods, using examples from an ongoing study of an interpretable neural network for predicting a particular learner behavior.
Score: 0.6138671548064356
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The challenge of creating interpretable models has been taken up by two main research communities: ML researchers primarily focused on lower-level explainability methods that suit the needs of engineers, and HCI researchers who have more heavily emphasized user-centered approaches often based on participatory design methods. This paper reviews how these communities have evaluated interpretability, identifying overlaps and semantic misalignments. We propose moving towards a unified framework of evaluation criteria and lay the groundwork for such a framework by articulating the relationships between existing criteria. We argue that explanations serve as mediators between models and stakeholders, whether for intrinsically interpretable models or opaque black-box models analyzed via post-hoc techniques. We further argue that useful explanations require both faithfulness and intelligibility. Explanation plausibility is a prerequisite for intelligibility, while stability is a prerequisite for explanation faithfulness. We illustrate these criteria, as well as specific evaluation methods, using examples from an ongoing study of an interpretable neural network for predicting a particular learner behavior.

Related papers

Causality can systematically address the monsters under the bench(marks) [64.36592889550431]
Benchmarks are plagued by various biases, artifacts, or leakage. Models may behave unreliably due to poorly explored failure modes. causality offers an ideal framework to systematically address these challenges.
arXiv Detail & Related papers (2025-02-07T17:01:37Z)
On the Reasoning Capacity of AI Models and How to Quantify It [0.0]
Large Language Models (LLMs) have intensified the debate surrounding the fundamental nature of their reasoning capabilities. While achieving high performance on benchmarks such as GPQA and MMLU, these models exhibit limitations in more complex reasoning tasks. We propose a novel phenomenological approach that goes beyond traditional accuracy metrics to probe the underlying mechanisms of model behavior.
arXiv Detail & Related papers (2025-01-23T16:58:18Z)
On the Fairness, Diversity and Reliability of Text-to-Image Generative Models [49.60774626839712]
multimodal generative models have sparked critical discussions on their fairness, reliability, and potential for misuse. We propose an evaluation framework designed to assess model reliability through their responses to perturbations in the embedding space. Our method lays the groundwork for detecting unreliable, bias-injected models and retrieval of bias provenance.
arXiv Detail & Related papers (2024-11-21T09:46:55Z)
Interpreting Inflammation Prediction Model via Tag-based Cohort Explanation [5.356481722174994]
We propose a novel framework for identifying cohorts within a dataset based on local feature importance scores. We evaluate our framework on a food-based inflammation prediction model and demonstrated that the framework can generate reliable explanations that match domain knowledge.
arXiv Detail & Related papers (2024-10-17T23:22:59Z)
Exposing Assumptions in AI Benchmarks through Cognitive Modelling [0.0]
Cultural AI benchmarks often rely on implicit assumptions about measured constructs, leading to vague formulations with poor validity and unclear interrelations. We propose exposing these assumptions using explicit cognitive models formulated as Structural Equation Models.
arXiv Detail & Related papers (2024-09-25T11:55:02Z)
Benchmarks as Microscopes: A Call for Model Metrology [76.64402390208576]
Modern language models (LMs) pose a new challenge in capability assessment. To be confident in our metrics, we need a new discipline of model metrology.
arXiv Detail & Related papers (2024-07-22T17:52:12Z)
Relational Concept Bottleneck Models [13.311396882130033]
Concept Bottleneck Models (CBMs) are not designed to solve problems. R-CBMs are capable of both representing standard CBMs and relational GNNs. In particular, we show that R-CBMs support the generation of concept-based explanations.
arXiv Detail & Related papers (2023-08-23T08:25:33Z)
Are Neural Topic Models Broken? [81.15470302729638]
We study the relationship between automated and human evaluation of topic models. We find that neural topic models fare worse in both respects compared to an established classical method.
arXiv Detail & Related papers (2022-10-28T14:38:50Z)
Logical Satisfiability of Counterfactuals for Faithful Explanations in NLI [60.142926537264714]
We introduce the methodology of Faithfulness-through-Counterfactuals. It generates a counterfactual hypothesis based on the logical predicates expressed in the explanation. It then evaluates if the model's prediction on the counterfactual is consistent with that expressed logic.
arXiv Detail & Related papers (2022-05-25T03:40:59Z)
A Survey on the Robustness of Feature Importance and Counterfactual Explanations [12.599872913953238]
We present a survey of the works that analysed the robustness of two classes of local explanations. The survey aims to unify existing definitions of robustness, introduces a taxonomy to classify different robustness approaches, and discusses some interesting results.
arXiv Detail & Related papers (2021-10-30T22:48:04Z)
When Stability meets Sufficiency: Informative Explanations that do not Overwhelm [15.897648942908747]
We consider features-based attribution methods that highlight what should be minimally sufficient to justify the classification of an input. While minimal sufficiency is an attractive property akin to comprehensibility, the resulting explanations are often too sparse for a human to understand and evaluate the local behavior of the model. We propose a novel method called Path-Sufficient Explanations Method (PSEM) that outputs a sequence of stable and sufficient explanations for a given input.
arXiv Detail & Related papers (2021-09-13T16:06:10Z)
On the Faithfulness Measurements for Model Interpretations [100.2730234575114]
Post-hoc interpretations aim to uncover how natural language processing (NLP) models make predictions. To tackle these issues, we start with three criteria: the removal-based criterion, the sensitivity of interpretations, and the stability of interpretations. Motivated by the desideratum of these faithfulness notions, we introduce a new class of interpretation methods that adopt techniques from the adversarial domain.
arXiv Detail & Related papers (2021-04-18T09:19:44Z)
Evaluations and Methods for Explanation through Robustness Analysis [117.7235152610957]
We establish a novel set of evaluation criteria for such feature based explanations by analysis. We obtain new explanations that are loosely necessary and sufficient for a prediction. We extend the explanation to extract the set of features that would move the current prediction to a target class.
arXiv Detail & Related papers (2020-05-31T05:52:05Z)
Benchmarking Machine Reading Comprehension: A Psychological Perspective [45.85089157315507]
Machine reading comprehension (MRC) has received considerable attention as a benchmark for natural language understanding. The conventional task design of MRC lacks explainability beyond the model interpretation. This paper provides a theoretical basis for the design of MRC datasets based on psychology as well as psychometrics.
arXiv Detail & Related papers (2020-04-04T11:45:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.