Sound Explanation for Trustworthy Machine Learning
- URL: http://arxiv.org/abs/2306.06134v1
- Date: Thu, 8 Jun 2023 19:58:30 GMT
- Title: Sound Explanation for Trustworthy Machine Learning
- Authors: Kai Jia, Pasapol Saowakon, Limor Appelbaum, Martin Rinard
- Abstract summary: We argue against the practice of interpreting black-box models via attributing scores to input components.
We then formalize the concept of sound explanation, that has been informally adopted in prior work.
We present the application of feature selection as a sound explanation for cancer prediction models to cultivate trust among clinicians.
- Score: 11.779125616468194
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We take a formal approach to the explainability problem of machine learning
systems. We argue against the practice of interpreting black-box models via
attributing scores to input components due to inherently conflicting goals of
attribution-based interpretation. We prove that no attribution algorithm
satisfies specificity, additivity, completeness, and baseline invariance. We
then formalize the concept, sound explanation, that has been informally adopted
in prior work. A sound explanation entails providing sufficient information to
causally explain the predictions made by a system. Finally, we present the
application of feature selection as a sound explanation for cancer prediction
models to cultivate trust among clinicians.
Related papers
- Selective Explanations [14.312717332216073]
A machine learning model is trained to predict feature attribution scores with only one inference.
Despite their efficiency, amortized explainers can produce inaccurate predictions and misleading explanations.
We propose selective explanations, a novel feature attribution method that detects when amortized explainers generate low-quality explanations.
arXiv Detail & Related papers (2024-05-29T23:08:31Z) - Estimation of Concept Explanations Should be Uncertainty Aware [39.598213804572396]
We study a specific kind called Concept Explanations, where the goal is to interpret a model using human-understandable concepts.
Although popular for their easy interpretation, concept explanations are known to be noisy.
We propose an uncertainty-aware Bayesian estimation method to address these issues, which readily improved the quality of explanations.
arXiv Detail & Related papers (2023-12-13T11:17:27Z) - Explaining Hate Speech Classification with Model Agnostic Methods [0.9990687944474738]
The research goal of this paper is to bridge the gap between hate speech prediction and the explanations generated by the system to support its decision.
This has been achieved by first predicting the classification of a text and then providing a posthoc, model agnostic and surrogate interpretability approach.
arXiv Detail & Related papers (2023-05-30T19:52:56Z) - Logical Satisfiability of Counterfactuals for Faithful Explanations in
NLI [60.142926537264714]
We introduce the methodology of Faithfulness-through-Counterfactuals.
It generates a counterfactual hypothesis based on the logical predicates expressed in the explanation.
It then evaluates if the model's prediction on the counterfactual is consistent with that expressed logic.
arXiv Detail & Related papers (2022-05-25T03:40:59Z) - A Human-Centered Interpretability Framework Based on Weight of Evidence [26.94750208505883]
We take a human-centered approach to interpretable machine learning.
We propose a list of design principles for machine-generated explanations meaningful to humans.
We show that this method can be adapted to handle high-dimensional, multi-class settings.
arXiv Detail & Related papers (2021-04-27T16:13:35Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - Contrastive Explanations for Model Interpretability [77.92370750072831]
We propose a methodology to produce contrastive explanations for classification models.
Our method is based on projecting model representation to a latent space.
Our findings shed light on the ability of label-contrastive explanations to provide a more accurate and finer-grained interpretability of a model's decision.
arXiv Detail & Related papers (2021-03-02T00:36:45Z) - This is not the Texture you are looking for! Introducing Novel
Counterfactual Explanations for Non-Experts using Generative Adversarial
Learning [59.17685450892182]
counterfactual explanation systems try to enable a counterfactual reasoning by modifying the input image.
We present a novel approach to generate such counterfactual image explanations based on adversarial image-to-image translation techniques.
Our results show that our approach leads to significantly better results regarding mental models, explanation satisfaction, trust, emotions, and self-efficacy than two state-of-the art systems.
arXiv Detail & Related papers (2020-12-22T10:08:05Z) - The Struggles of Feature-Based Explanations: Shapley Values vs. Minimal
Sufficient Subsets [61.66584140190247]
We show that feature-based explanations pose problems even for explaining trivial models.
We show that two popular classes of explainers, Shapley explainers and minimal sufficient subsets explainers, target fundamentally different types of ground-truth explanations.
arXiv Detail & Related papers (2020-09-23T09:45:23Z) - Explaining predictive models with mixed features using Shapley values
and conditional inference trees [1.8065361710947976]
Shapley values stand out as a sound method to explain predictions from any type of machine learning model.
We propose a method to explain mixed dependent features by modeling the dependence structure of the features using conditional inference trees.
arXiv Detail & Related papers (2020-07-02T11:25:45Z) - Evaluations and Methods for Explanation through Robustness Analysis [117.7235152610957]
We establish a novel set of evaluation criteria for such feature based explanations by analysis.
We obtain new explanations that are loosely necessary and sufficient for a prediction.
We extend the explanation to extract the set of features that would move the current prediction to a target class.
arXiv Detail & Related papers (2020-05-31T05:52:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.