Dependency Decomposition and a Reject Option for Explainable Models
- URL: http://arxiv.org/abs/2012.06523v1
- Date: Fri, 11 Dec 2020 17:39:33 GMT
- Title: Dependency Decomposition and a Reject Option for Explainable Models
- Authors: Jan Kronenberger and Anselm Haselhoff
- Abstract summary: Recent deep learning models perform extremely well in various inference tasks.
Recent advances offer methods to visualize features, describe attribution of the input.
We present the first analysis of dependencies regarding the probability distribution over the desired image classification outputs.
- Score: 4.94950858749529
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deploying machine learning models in safety-related do-mains (e.g. autonomous
driving, medical diagnosis) demands for approaches that are explainable, robust
against adversarial attacks and aware of the model uncertainty. Recent deep
learning models perform extremely well in various inference tasks, but the
black-box nature of these approaches leads to a weakness regarding the three
requirements mentioned above. Recent advances offer methods to visualize
features, describe attribution of the input (e.g.heatmaps), provide textual
explanations or reduce dimensionality. However,are explanations for
classification tasks dependent or are they independent of each other? For
in-stance, is the shape of an object dependent on the color? What is the effect
of using the predicted class for generating explanations and vice versa? In the
context of explainable deep learning models, we present the first analysis of
dependencies regarding the probability distribution over the desired image
classification outputs and the explaining variables (e.g. attributes, texts,
heatmaps). Therefore, we perform an Explanation Dependency Decomposition (EDD).
We analyze the implications of the different dependencies and propose two ways
of generating the explanation. Finally, we use the explanation to verify
(accept or reject) the prediction
Related papers
- Explaining the Model and Feature Dependencies by Decomposition of the
Shapley Value [3.0655581300025996]
Shapley values have become one of the go-to methods to explain complex models to end-users.
One downside is that they always require outputs of the model when some features are missing.
This however introduces a non-trivial choice: do we condition on the unknown features or not?
We propose a new algorithmic approach to combine both explanations, removing the burden of choice and enhancing the explanatory power of Shapley values.
arXiv Detail & Related papers (2023-06-19T12:20:23Z) - Are Data-driven Explanations Robust against Out-of-distribution Data? [18.760475318852375]
We propose an end-to-end model-agnostic learning framework Distributionally Robust Explanations (DRE)
Key idea is to fully utilize the inter-distribution information to provide supervisory signals for the learning of explanations without human annotation.
Our results demonstrate that the proposed method significantly improves the model's performance in terms of explanation and prediction robustness against distributional shifts.
arXiv Detail & Related papers (2023-03-29T02:02:08Z) - ELUDE: Generating interpretable explanations via a decomposition into
labelled and unlabelled features [23.384134043048807]
We develop an explanation framework that decomposes a model's prediction into two parts.
By identifying the latter, we are able to analyze the "unexplained" portion of the model.
We show that the set of unlabelled features can generalize to multiple models trained with the same feature space.
arXiv Detail & Related papers (2022-06-15T17:36:55Z) - Human Interpretation of Saliency-based Explanation Over Text [65.29015910991261]
We study saliency-based explanations over textual data.
We find that people often mis-interpret the explanations.
We propose a method to adjust saliencies based on model estimates of over- and under-perception.
arXiv Detail & Related papers (2022-01-27T15:20:32Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - Contrastive Explanations for Model Interpretability [77.92370750072831]
We propose a methodology to produce contrastive explanations for classification models.
Our method is based on projecting model representation to a latent space.
Our findings shed light on the ability of label-contrastive explanations to provide a more accurate and finer-grained interpretability of a model's decision.
arXiv Detail & Related papers (2021-03-02T00:36:45Z) - Leakage-Adjusted Simulatability: Can Models Generate Non-Trivial
Explanations of Their Behavior in Natural Language? [86.60613602337246]
We introduce a leakage-adjusted simulatability (LAS) metric for evaluating NL explanations.
LAS measures how well explanations help an observer predict a model's output, while controlling for how explanations can directly leak the output.
We frame explanation generation as a multi-agent game and optimize explanations for simulatability while penalizing label leakage.
arXiv Detail & Related papers (2020-10-08T16:59:07Z) - The Struggles of Feature-Based Explanations: Shapley Values vs. Minimal
Sufficient Subsets [61.66584140190247]
We show that feature-based explanations pose problems even for explaining trivial models.
We show that two popular classes of explainers, Shapley explainers and minimal sufficient subsets explainers, target fundamentally different types of ground-truth explanations.
arXiv Detail & Related papers (2020-09-23T09:45:23Z) - Deducing neighborhoods of classes from a fitted model [68.8204255655161]
In this article a new kind of interpretable machine learning method is presented.
It can help to understand the partitioning of the feature space into predicted classes in a classification model using quantile shifts.
Basically, real data points (or specific points of interest) are used and the changes of the prediction after slightly raising or decreasing specific features are observed.
arXiv Detail & Related papers (2020-09-11T16:35:53Z) - Counterfactual explanation of machine learning survival models [5.482532589225552]
It is shown that the counterfactual explanation problem can be reduced to a standard convex optimization problem with linear constraints.
For other black-box models, it is proposed to apply the well-known Particle Swarm Optimization algorithm.
arXiv Detail & Related papers (2020-06-26T19:46:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.