ELUDE: Generating interpretable explanations via a decomposition into
labelled and unlabelled features
- URL: http://arxiv.org/abs/2206.07690v2
- Date: Thu, 16 Jun 2022 21:43:36 GMT
- Title: ELUDE: Generating interpretable explanations via a decomposition into
labelled and unlabelled features
- Authors: Vikram V. Ramaswamy, Sunnie S. Y. Kim, Nicole Meister, Ruth Fong, Olga
Russakovsky
- Abstract summary: We develop an explanation framework that decomposes a model's prediction into two parts.
By identifying the latter, we are able to analyze the "unexplained" portion of the model.
We show that the set of unlabelled features can generalize to multiple models trained with the same feature space.
- Score: 23.384134043048807
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning models have achieved remarkable success in different areas of
machine learning over the past decade; however, the size and complexity of
these models make them difficult to understand. In an effort to make them more
interpretable, several recent works focus on explaining parts of a deep neural
network through human-interpretable, semantic attributes. However, it may be
impossible to completely explain complex models using only semantic attributes.
In this work, we propose to augment these attributes with a small set of
uninterpretable features. Specifically, we develop a novel explanation
framework ELUDE (Explanation via Labelled and Unlabelled DEcomposition) that
decomposes a model's prediction into two parts: one that is explainable through
a linear combination of the semantic attributes, and another that is dependent
on the set of uninterpretable features. By identifying the latter, we are able
to analyze the "unexplained" portion of the model, obtaining insights into the
information used by the model. We show that the set of unlabelled features can
generalize to multiple models trained with the same feature space and compare
our work to two popular attribute-oriented methods, Interpretable Basis
Decomposition and Concept Bottleneck, and discuss the additional insights ELUDE
provides.
Related papers
- Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers.
We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models.
Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z) - CNN-based explanation ensembling for dataset, representation and explanations evaluation [1.1060425537315088]
We explore the potential of ensembling explanations generated by deep classification models using convolutional model.
Through experimentation and analysis, we aim to investigate the implications of combining explanations to uncover a more coherent and reliable patterns of the model's behavior.
arXiv Detail & Related papers (2024-04-16T08:39:29Z) - Explaining the Model and Feature Dependencies by Decomposition of the
Shapley Value [3.0655581300025996]
Shapley values have become one of the go-to methods to explain complex models to end-users.
One downside is that they always require outputs of the model when some features are missing.
This however introduces a non-trivial choice: do we condition on the unknown features or not?
We propose a new algorithmic approach to combine both explanations, removing the burden of choice and enhancing the explanatory power of Shapley values.
arXiv Detail & Related papers (2023-06-19T12:20:23Z) - Explaining Explainability: Towards Deeper Actionable Insights into Deep
Learning through Second-order Explainability [70.60433013657693]
Second-order explainable AI (SOXAI) was recently proposed to extend explainable AI (XAI) from the instance level to the dataset level.
We demonstrate for the first time, via example classification and segmentation cases, that eliminating irrelevant concepts from the training set based on actionable insights from SOXAI can enhance a model's performance.
arXiv Detail & Related papers (2023-06-14T23:24:01Z) - Learning with Explanation Constraints [91.23736536228485]
We provide a learning theoretic framework to analyze how explanations can improve the learning of our models.
We demonstrate the benefits of our approach over a large array of synthetic and real-world experiments.
arXiv Detail & Related papers (2023-03-25T15:06:47Z) - ExSum: From Local Explanations to Model Understanding [6.23934576145261]
Interpretability methods are developed to understand the working mechanisms of black-box models.
Fulfilling this goal requires both that the explanations generated by these methods are correct and that people can easily and reliably understand them.
We introduce explanation summary (ExSum), a mathematical framework for quantifying model understanding.
arXiv Detail & Related papers (2022-04-30T02:07:20Z) - Contrastive Explanations for Model Interpretability [77.92370750072831]
We propose a methodology to produce contrastive explanations for classification models.
Our method is based on projecting model representation to a latent space.
Our findings shed light on the ability of label-contrastive explanations to provide a more accurate and finer-grained interpretability of a model's decision.
arXiv Detail & Related papers (2021-03-02T00:36:45Z) - A Framework to Learn with Interpretation [2.3741312212138896]
We present a novel framework to jointly learn a predictive model and its associated interpretation model.
We seek for a small-size dictionary of high level attribute functions that take as inputs the outputs of selected hidden layers.
A detailed pipeline to visualize the learnt features is also developed.
arXiv Detail & Related papers (2020-10-19T09:26:28Z) - Human-interpretable model explainability on high-dimensional data [8.574682463936007]
We introduce a framework for human-interpretable explainability on high-dimensional data, consisting of two modules.
First, we apply a semantically meaningful latent representation, both to reduce the raw dimensionality of the data, and to ensure its human interpretability.
Second, we adapt the Shapley paradigm for model-agnostic explainability to operate on these latent features. This leads to interpretable model explanations that are both theoretically controlled and computationally tractable.
arXiv Detail & Related papers (2020-10-14T20:06:28Z) - The Struggles of Feature-Based Explanations: Shapley Values vs. Minimal
Sufficient Subsets [61.66584140190247]
We show that feature-based explanations pose problems even for explaining trivial models.
We show that two popular classes of explainers, Shapley explainers and minimal sufficient subsets explainers, target fundamentally different types of ground-truth explanations.
arXiv Detail & Related papers (2020-09-23T09:45:23Z) - Deducing neighborhoods of classes from a fitted model [68.8204255655161]
In this article a new kind of interpretable machine learning method is presented.
It can help to understand the partitioning of the feature space into predicted classes in a classification model using quantile shifts.
Basically, real data points (or specific points of interest) are used and the changes of the prediction after slightly raising or decreasing specific features are observed.
arXiv Detail & Related papers (2020-09-11T16:35:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.