On the Objective Evaluation of Post Hoc Explainers
- URL: http://arxiv.org/abs/2106.08376v1
- Date: Tue, 15 Jun 2021 19:06:51 GMT
- Title: On the Objective Evaluation of Post Hoc Explainers
- Authors: Zachariah Carmichael, Walter J. Scheirer
- Abstract summary: Modern trends in machine learning research have led to algorithms that are increasingly intricate to the degree that they are considered to be black boxes.
In an effort to reduce the opacity of decisions, methods have been proposed to construe the inner workings of such models in a human-comprehensible manner.
We propose a framework for the evaluation of post hoc explainers on ground truth that is directly derived from the additive structure of a model.
- Score: 10.981508361941335
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many applications of data-driven models demand transparency of decisions,
especially in health care, criminal justice, and other high-stakes
environments. Modern trends in machine learning research have led to algorithms
that are increasingly intricate to the degree that they are considered to be
black boxes. In an effort to reduce the opacity of decisions, methods have been
proposed to construe the inner workings of such models in a
human-comprehensible manner. These post hoc techniques are described as being
universal explainers - capable of faithfully augmenting decisions with
algorithmic insight. Unfortunately, there is little agreement about what
constitutes a "good" explanation. Moreover, current methods of explanation
evaluation are derived from either subjective or proxy means. In this work, we
propose a framework for the evaluation of post hoc explainers on ground truth
that is directly derived from the additive structure of a model. We demonstrate
the efficacy of the framework in understanding explainers by evaluating popular
explainers on thousands of synthetic and several real-world tasks. The
framework unveils that explanations may be accurate but misattribute the
importance of individual features.
Related papers
- Toward Understanding the Disagreement Problem in Neural Network Feature Attribution [0.8057006406834466]
neural networks have demonstrated their remarkable ability to discern intricate patterns and relationships from raw data.
Understanding the inner workings of these black box models remains challenging, yet crucial for high-stake decisions.
Our work addresses this confusion by investigating the explanations' fundamental and distributional behavior.
arXiv Detail & Related papers (2024-04-17T12:45:59Z) - Evaluating the Utility of Model Explanations for Model Development [54.23538543168767]
We evaluate whether explanations can improve human decision-making in practical scenarios of machine learning model development.
To our surprise, we did not find evidence of significant improvement on tasks when users were provided with any of the saliency maps.
These findings suggest caution regarding the usefulness and potential for misunderstanding in saliency-based explanations.
arXiv Detail & Related papers (2023-12-10T23:13:23Z) - How Well Do Feature-Additive Explainers Explain Feature-Additive
Predictors? [12.993027779814478]
We ask the question: can popular feature-additive explainers (e.g., LIME, SHAP, SHAPR, MAPLE, and PDP) explain feature-additive predictors?
Herein, we evaluate such explainers on ground truth that is analytically derived from the additive structure of a model.
Our results suggest that all explainers eventually fail to correctly attribute the importance of features, especially when a decision-making process involves feature interactions.
arXiv Detail & Related papers (2023-10-27T21:16:28Z) - Explainability for Large Language Models: A Survey [59.67574757137078]
Large language models (LLMs) have demonstrated impressive capabilities in natural language processing.
This paper introduces a taxonomy of explainability techniques and provides a structured overview of methods for explaining Transformer-based language models.
arXiv Detail & Related papers (2023-09-02T22:14:26Z) - Explaining Explainability: Towards Deeper Actionable Insights into Deep
Learning through Second-order Explainability [70.60433013657693]
Second-order explainable AI (SOXAI) was recently proposed to extend explainable AI (XAI) from the instance level to the dataset level.
We demonstrate for the first time, via example classification and segmentation cases, that eliminating irrelevant concepts from the training set based on actionable insights from SOXAI can enhance a model's performance.
arXiv Detail & Related papers (2023-06-14T23:24:01Z) - Helpful, Misleading or Confusing: How Humans Perceive Fundamental
Building Blocks of Artificial Intelligence Explanations [11.667611038005552]
We take a step back from sophisticated predictive algorithms and look into explainability of simple decision-making models.
We aim to assess how people perceive comprehensibility of their different representations.
This allows us to capture how diverse stakeholders judge intelligibility of fundamental concepts that more elaborate artificial intelligence explanations are built from.
arXiv Detail & Related papers (2023-03-02T03:15:35Z) - ExSum: From Local Explanations to Model Understanding [6.23934576145261]
Interpretability methods are developed to understand the working mechanisms of black-box models.
Fulfilling this goal requires both that the explanations generated by these methods are correct and that people can easily and reliably understand them.
We introduce explanation summary (ExSum), a mathematical framework for quantifying model understanding.
arXiv Detail & Related papers (2022-04-30T02:07:20Z) - Interpretable Deep Learning: Interpretations, Interpretability,
Trustworthiness, and Beyond [49.93153180169685]
We introduce and clarify two basic concepts-interpretations and interpretability-that people usually get confused.
We elaborate the design of several recent interpretation algorithms, from different perspectives, through proposing a new taxonomy.
We summarize the existing work in evaluating models' interpretability using "trustworthy" interpretation algorithms.
arXiv Detail & Related papers (2021-03-19T08:40:30Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - A Diagnostic Study of Explainability Techniques for Text Classification [52.879658637466605]
We develop a list of diagnostic properties for evaluating existing explainability techniques.
We compare the saliency scores assigned by the explainability techniques with human annotations of salient input regions to find relations between a model's performance and the agreement of its rationales with human ones.
arXiv Detail & Related papers (2020-09-25T12:01:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.