Human-interpretable model explainability on high-dimensional data
- URL: http://arxiv.org/abs/2010.07384v2
- Date: Mon, 20 Dec 2021 17:53:43 GMT
- Title: Human-interpretable model explainability on high-dimensional data
- Authors: Damien de Mijolla, Christopher Frye, Markus Kunesch, John Mansir, Ilya
Feige
- Abstract summary: We introduce a framework for human-interpretable explainability on high-dimensional data, consisting of two modules.
First, we apply a semantically meaningful latent representation, both to reduce the raw dimensionality of the data, and to ensure its human interpretability.
Second, we adapt the Shapley paradigm for model-agnostic explainability to operate on these latent features. This leads to interpretable model explanations that are both theoretically controlled and computationally tractable.
- Score: 8.574682463936007
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The importance of explainability in machine learning continues to grow, as
both neural-network architectures and the data they model become increasingly
complex. Unique challenges arise when a model's input features become high
dimensional: on one hand, principled model-agnostic approaches to
explainability become too computationally expensive; on the other, more
efficient explainability algorithms lack natural interpretations for general
users. In this work, we introduce a framework for human-interpretable
explainability on high-dimensional data, consisting of two modules. First, we
apply a semantically meaningful latent representation, both to reduce the raw
dimensionality of the data, and to ensure its human interpretability. These
latent features can be learnt, e.g. explicitly as disentangled representations
or implicitly through image-to-image translation, or they can be based on any
computable quantities the user chooses. Second, we adapt the Shapley paradigm
for model-agnostic explainability to operate on these latent features. This
leads to interpretable model explanations that are both theoretically
controlled and computationally tractable. We benchmark our approach on
synthetic data and demonstrate its effectiveness on several
image-classification tasks.
Related papers
- Learning Discrete Concepts in Latent Hierarchical Models [73.01229236386148]
Learning concepts from natural high-dimensional data holds potential in building human-aligned and interpretable machine learning models.
We formalize concepts as discrete latent causal variables that are related via a hierarchical causal model.
We substantiate our theoretical claims with synthetic data experiments.
arXiv Detail & Related papers (2024-06-01T18:01:03Z) - Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers.
We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models.
Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z) - Exploring the Trade-off Between Model Performance and Explanation Plausibility of Text Classifiers Using Human Rationales [3.242050660144211]
Saliency post-hoc explainability methods are important tools for understanding increasingly complex NLP models.
We present a methodology for incorporating rationales, which are text annotations explaining human decisions, into text classification models.
arXiv Detail & Related papers (2024-04-03T22:39:33Z) - Diffexplainer: Towards Cross-modal Global Explanations with Diffusion Models [51.21351775178525]
DiffExplainer is a novel framework that, leveraging language-vision models, enables multimodal global explainability.
It employs diffusion models conditioned on optimized text prompts, synthesizing images that maximize class outputs.
The analysis of generated visual descriptions allows for automatic identification of biases and spurious features.
arXiv Detail & Related papers (2024-04-03T10:11:22Z) - Explaining Explainability: Towards Deeper Actionable Insights into Deep
Learning through Second-order Explainability [70.60433013657693]
Second-order explainable AI (SOXAI) was recently proposed to extend explainable AI (XAI) from the instance level to the dataset level.
We demonstrate for the first time, via example classification and segmentation cases, that eliminating irrelevant concepts from the training set based on actionable insights from SOXAI can enhance a model's performance.
arXiv Detail & Related papers (2023-06-14T23:24:01Z) - A simple probabilistic neural network for machine understanding [0.0]
We discuss probabilistic neural networks with a fixed internal representation as models for machine understanding.
We derive the internal representation by requiring that it satisfies the principles of maximal relevance and of maximal ignorance about how different features are combined.
We argue that learning machines with this architecture enjoy a number of interesting properties, like the continuity of the representation with respect to changes in parameters and data.
arXiv Detail & Related papers (2022-10-24T13:00:15Z) - ELUDE: Generating interpretable explanations via a decomposition into
labelled and unlabelled features [23.384134043048807]
We develop an explanation framework that decomposes a model's prediction into two parts.
By identifying the latter, we are able to analyze the "unexplained" portion of the model.
We show that the set of unlabelled features can generalize to multiple models trained with the same feature space.
arXiv Detail & Related papers (2022-06-15T17:36:55Z) - Model-agnostic multi-objective approach for the evolutionary discovery
of mathematical models [55.41644538483948]
In modern data science, it is more interesting to understand the properties of the model, which parts could be replaced to obtain better results.
We use multi-objective evolutionary optimization for composite data-driven model learning to obtain the algorithm's desired properties.
arXiv Detail & Related papers (2021-07-07T11:17:09Z) - The Definitions of Interpretability and Learning of Interpretable Models [42.22982369082474]
We propose a mathematical definition for the human-interpretable model.
If a prediction model is interpretable by a human recognition system, the prediction model is defined as a completely human-interpretable model.
arXiv Detail & Related papers (2021-05-29T01:44:12Z) - Model Learning with Personalized Interpretability Estimation (ML-PIE) [2.862606936691229]
High-stakes applications require AI-generated models to be interpretable.
Current algorithms for the synthesis of potentially interpretable models rely on objectives or regularization terms.
We propose an approach for the synthesis of models that are tailored to the user.
arXiv Detail & Related papers (2021-04-13T09:47:48Z) - Generative Counterfactuals for Neural Networks via Attribute-Informed
Perturbation [51.29486247405601]
We design a framework to generate counterfactuals for raw data instances with the proposed Attribute-Informed Perturbation (AIP)
By utilizing generative models conditioned with different attributes, counterfactuals with desired labels can be obtained effectively and efficiently.
Experimental results on real-world texts and images demonstrate the effectiveness, sample quality as well as efficiency of our designed framework.
arXiv Detail & Related papers (2021-01-18T08:37:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.