What Makes a Good Explanation?: A Harmonized View of Properties of Explanations
- URL: http://arxiv.org/abs/2211.05667v3
- Date: Fri, 12 Jul 2024 15:34:29 GMT
- Title: What Makes a Good Explanation?: A Harmonized View of Properties of Explanations
- Authors: Zixi Chen, Varshini Subhash, Marton Havasi, Weiwei Pan, Finale Doshi-Velez,
- Abstract summary: Interpretability provides a means for humans to verify aspects of machine learning (ML) models.
Different contexts require explanations with different properties.
There is a lack of standardization when it comes to properties of explanations.
- Score: 22.752085594102777
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Interpretability provides a means for humans to verify aspects of machine learning (ML) models and empower human+ML teaming in situations where the task cannot be fully automated. Different contexts require explanations with different properties. For example, the kind of explanation required to determine if an early cardiac arrest warning system is ready to be integrated into a care setting is very different from the type of explanation required for a loan applicant to help determine the actions they might need to take to make their application successful. Unfortunately, there is a lack of standardization when it comes to properties of explanations: different papers may use the same term to mean different quantities, and different terms to mean the same quantity. This lack of a standardized terminology and categorization of the properties of ML explanations prevents us from both rigorously comparing interpretable machine learning methods and identifying what properties are needed in what contexts. In this work, we survey properties defined in interpretable machine learning papers, synthesize them based on what they actually measure, and describe the trade-offs between different formulations of these properties. In doing so, we enable more informed selection of task-appropriate formulations of explanation properties as well as standardization for future work in interpretable machine learning.
Related papers
- Complementary Explanations for Effective In-Context Learning [77.83124315634386]
Large language models (LLMs) have exhibited remarkable capabilities in learning from explanations in prompts.
This work aims to better understand the mechanisms by which explanations are used for in-context learning.
arXiv Detail & Related papers (2022-11-25T04:40:47Z) - Feature Necessity & Relevancy in ML Classifier Explanations [5.232306238197686]
Given a machine learning (ML) model and a prediction, explanations can be defined as sets of features which are sufficient for the prediction.
It is also critical to understand whether sensitive features can occur in some explanation, or whether a non-interesting feature must occur in all explanations.
arXiv Detail & Related papers (2022-10-27T12:12:45Z) - Studying the explanations for the automated prediction of bug and
non-bug issues using LIME and SHAP [7.792303263390021]
We want to understand if machine learning models provide explanations for the classification that are reasonable to us as humans.
We also want to know if the prediction quality is correlated with the quality of explanations.
arXiv Detail & Related papers (2022-09-15T21:45:46Z) - The Need for Interpretable Features: Motivation and Taxonomy [69.07189753428553]
We claim that the term "interpretable feature" is not specific nor detailed enough to capture the full extent to which features impact the usefulness of machine learning explanations.
In this paper, we motivate and discuss three key lessons: 1) more attention should be given to what we refer to as the interpretable feature space, or the state of features that are useful to domain experts taking real-world actions.
arXiv Detail & Related papers (2022-02-23T19:19:14Z) - Human Interpretation of Saliency-based Explanation Over Text [65.29015910991261]
We study saliency-based explanations over textual data.
We find that people often mis-interpret the explanations.
We propose a method to adjust saliencies based on model estimates of over- and under-perception.
arXiv Detail & Related papers (2022-01-27T15:20:32Z) - Properties from Mechanisms: An Equivariance Perspective on Identifiable
Representation Learning [79.4957965474334]
Key goal of unsupervised representation learning is "inverting" a data generating process to recover its latent properties.
This paper asks, "Can we instead identify latent properties by leveraging knowledge of the mechanisms that govern their evolution?"
We provide a complete characterization of the sources of non-identifiability as we vary knowledge about a set of possible mechanisms.
arXiv Detail & Related papers (2021-10-29T14:04:08Z) - Contrastive Explanations for Model Interpretability [77.92370750072831]
We propose a methodology to produce contrastive explanations for classification models.
Our method is based on projecting model representation to a latent space.
Our findings shed light on the ability of label-contrastive explanations to provide a more accurate and finer-grained interpretability of a model's decision.
arXiv Detail & Related papers (2021-03-02T00:36:45Z) - On Interpretability and Similarity in Concept-Based Machine Learning [2.3986080077861787]
We discuss how notions from cooperative game theory can be used to assess the contribution of individual attributes in classification and clustering processes in concept-based machine learning.
To address the 3rd question, we present some ideas on how to reduce the number of attributes using similarities in large contexts.
arXiv Detail & Related papers (2021-02-25T07:57:28Z) - Dependency Decomposition and a Reject Option for Explainable Models [4.94950858749529]
Recent deep learning models perform extremely well in various inference tasks.
Recent advances offer methods to visualize features, describe attribution of the input.
We present the first analysis of dependencies regarding the probability distribution over the desired image classification outputs.
arXiv Detail & Related papers (2020-12-11T17:39:33Z) - Towards Unifying Feature Attribution and Counterfactual Explanations:
Different Means to the Same End [17.226134854746267]
We present a method to generate feature attribution explanations from a set of counterfactual examples.
We show how counterfactual examples can be used to evaluate the goodness of an attribution-based explanation in terms of its necessity and sufficiency.
arXiv Detail & Related papers (2020-11-10T05:41:43Z) - An Information-Theoretic Approach to Personalized Explainable Machine
Learning [92.53970625312665]
We propose a simple probabilistic model for the predictions and user knowledge.
We quantify the effect of an explanation by the conditional mutual information between the explanation and prediction.
arXiv Detail & Related papers (2020-03-01T13:06:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.