The Solvability of Interpretability Evaluation Metrics
- URL: http://arxiv.org/abs/2205.08696v1
- Date: Wed, 18 May 2022 02:52:03 GMT
- Title: The Solvability of Interpretability Evaluation Metrics
- Authors: Yilun Zhou, Julie Shah
- Abstract summary: Feature attribution methods are often evaluated on metrics such as comprehensiveness and sufficiency.
In this paper, we highlight an intriguing property of these metrics: their solvability.
We present a series of investigations showing that this beam search explainer is generally comparable or favorable to current choices.
- Score: 7.3709604810699085
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Feature attribution methods are popular for explaining neural network
predictions, and they are often evaluated on metrics such as comprehensiveness
and sufficiency, which are motivated by the principle that more important
features -- as judged by the explanation -- should have larger impacts on model
prediction. In this paper, we highlight an intriguing property of these
metrics: their solvability. Concretely, we can define the problem of optimizing
an explanation for a metric and solve it using beam search. This brings up the
obvious question: given such solvability, why do we still develop other
explainers and then evaluate them on the metric? We present a series of
investigations showing that this beam search explainer is generally comparable
or favorable to current choices such as LIME and SHAP, suggest rethinking the
goals of model interpretability, and identify several directions towards better
evaluations of new method proposals.
Related papers
- A Critical Assessment of Interpretable and Explainable Machine Learning for Intrusion Detection [0.0]
We study the use of overly complex and opaque ML models, unaccounted data imbalances and correlated features, inconsistent influential features across different explanation methods, and the implausible utility of explanations.
Specifically, we advise avoiding complex opaque models such as Deep Neural Networks and instead using interpretable ML models such as Decision Trees.
We find that feature-based model explanations are most often inconsistent across different settings.
arXiv Detail & Related papers (2024-07-04T15:35:42Z) - Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode.
We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z) - Evaluating the Utility of Model Explanations for Model Development [54.23538543168767]
We evaluate whether explanations can improve human decision-making in practical scenarios of machine learning model development.
To our surprise, we did not find evidence of significant improvement on tasks when users were provided with any of the saliency maps.
These findings suggest caution regarding the usefulness and potential for misunderstanding in saliency-based explanations.
arXiv Detail & Related papers (2023-12-10T23:13:23Z) - Goodhart's Law Applies to NLP's Explanation Benchmarks [57.26445915212884]
We critically examine two sets of metrics: the ERASER metrics (comprehensiveness and sufficiency) and the EVAL-X metrics.
We show that we can inflate a model's comprehensiveness and sufficiency scores dramatically without altering its predictions or explanations on in-distribution test inputs.
Our results raise doubts about the ability of current metrics to guide explainability research, underscoring the need for a broader reassessment of what precisely these metrics are intended to capture.
arXiv Detail & Related papers (2023-08-28T03:03:03Z) - Detection Accuracy for Evaluating Compositional Explanations of Units [5.220940151628734]
Two examples of methods that use this approach are Network Dissection and Compositional explanations.
While intuitively, logical forms are more informative than atomic concepts, it is not clear how to quantify this improvement.
We propose to use as evaluation metric the Detection Accuracy, which measures units' consistency of detection of their assigned explanations.
arXiv Detail & Related papers (2021-09-16T08:47:34Z) - Search Methods for Sufficient, Socially-Aligned Feature Importance
Explanations with In-Distribution Counterfactuals [72.00815192668193]
Feature importance (FI) estimates are a popular form of explanation, and they are commonly created and evaluated by computing the change in model confidence caused by removing certain input features at test time.
We study several under-explored dimensions of FI-based explanations, providing conceptual and empirical improvements for this form of explanation.
arXiv Detail & Related papers (2021-06-01T20:36:48Z) - Evaluation of Similarity-based Explanations [36.10585276728203]
We investigated relevance metrics that can provide reasonable explanations to users.
Our experiments revealed that the cosine similarity of the gradients of the loss performs best.
Some metrics perform poorly in our tests and analyzed the reasons of their failure.
arXiv Detail & Related papers (2020-06-08T12:39:46Z) - Evaluations and Methods for Explanation through Robustness Analysis [117.7235152610957]
We establish a novel set of evaluation criteria for such feature based explanations by analysis.
We obtain new explanations that are loosely necessary and sufficient for a prediction.
We extend the explanation to extract the set of features that would move the current prediction to a target class.
arXiv Detail & Related papers (2020-05-31T05:52:05Z) - Invariant Rationalization [84.1861516092232]
A typical rationalization criterion, i.e. maximum mutual information (MMI), finds the rationale that maximizes the prediction performance based only on the rationale.
We introduce a game-theoretic invariant rationalization criterion where the rationales are constrained to enable the same predictor to be optimal across different environments.
We show both theoretically and empirically that the proposed rationales can rule out spurious correlations, generalize better to different test scenarios, and align better with human judgments.
arXiv Detail & Related papers (2020-03-22T00:50:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.