Related papers: Human-interpretable model explainability on high-dimensional data

Human-interpretable model explainability on high-dimensional data

URL: http://arxiv.org/abs/2010.07384v2
Date: Mon, 20 Dec 2021 17:53:43 GMT
Title: Human-interpretable model explainability on high-dimensional data
Authors: Damien de Mijolla, Christopher Frye, Markus Kunesch, John Mansir, Ilya Feige
Abstract summary: We introduce a framework for human-interpretable explainability on high-dimensional data, consisting of two modules. First, we apply a semantically meaningful latent representation, both to reduce the raw dimensionality of the data, and to ensure its human interpretability. Second, we adapt the Shapley paradigm for model-agnostic explainability to operate on these latent features. This leads to interpretable model explanations that are both theoretically controlled and computationally tractable.
Score: 8.574682463936007
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The importance of explainability in machine learning continues to grow, as both neural-network architectures and the data they model become increasingly complex. Unique challenges arise when a model's input features become high dimensional: on one hand, principled model-agnostic approaches to explainability become too computationally expensive; on the other, more efficient explainability algorithms lack natural interpretations for general users. In this work, we introduce a framework for human-interpretable explainability on high-dimensional data, consisting of two modules. First, we apply a semantically meaningful latent representation, both to reduce the raw dimensionality of the data, and to ensure its human interpretability. These latent features can be learnt, e.g. explicitly as disentangled representations or implicitly through image-to-image translation, or they can be based on any computable quantities the user chooses. Second, we adapt the Shapley paradigm for model-agnostic explainability to operate on these latent features. This leads to interpretable model explanations that are both theoretically controlled and computationally tractable. We benchmark our approach on synthetic data and demonstrate its effectiveness on several image-classification tasks.

Related papers

I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data? [79.01538178959726]
Large language models (LLMs) have led many to conclude that they exhibit a form of intelligence. We introduce a novel generative model that generates tokens on the basis of human interpretable concepts represented as latent discrete variables.
arXiv Detail & Related papers (2025-03-12T01:21:17Z)
Learning Discrete Concepts in Latent Hierarchical Models [73.01229236386148]
Learning concepts from natural high-dimensional data holds potential in building human-aligned and interpretable machine learning models. We formalize concepts as discrete latent causal variables that are related via a hierarchical causal model. We substantiate our theoretical claims with synthetic data experiments.
arXiv Detail & Related papers (2024-06-01T18:01:03Z)
Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers. We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models. Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z)
Exploring the Trade-off Between Model Performance and Explanation Plausibility of Text Classifiers Using Human Rationales [3.242050660144211]
Saliency post-hoc explainability methods are important tools for understanding increasingly complex NLP models. We present a methodology for incorporating rationales, which are text annotations explaining human decisions, into text classification models.
arXiv Detail & Related papers (2024-04-03T22:39:33Z)
Diffexplainer: Towards Cross-modal Global Explanations with Diffusion Models [51.21351775178525]
DiffExplainer is a novel framework that, leveraging language-vision models, enables multimodal global explainability. It employs diffusion models conditioned on optimized text prompts, synthesizing images that maximize class outputs. The analysis of generated visual descriptions allows for automatic identification of biases and spurious features.
arXiv Detail & Related papers (2024-04-03T10:11:22Z)
Explaining Explainability: Towards Deeper Actionable Insights into Deep Learning through Second-order Explainability [70.60433013657693]
Second-order explainable AI (SOXAI) was recently proposed to extend explainable AI (XAI) from the instance level to the dataset level. We demonstrate for the first time, via example classification and segmentation cases, that eliminating irrelevant concepts from the training set based on actionable insights from SOXAI can enhance a model's performance.
arXiv Detail & Related papers (2023-06-14T23:24:01Z)
A simple probabilistic neural network for machine understanding [0.0]
We discuss probabilistic neural networks with a fixed internal representation as models for machine understanding. We derive the internal representation by requiring that it satisfies the principles of maximal relevance and of maximal ignorance about how different features are combined. We argue that learning machines with this architecture enjoy a number of interesting properties, like the continuity of the representation with respect to changes in parameters and data.
arXiv Detail & Related papers (2022-10-24T13:00:15Z)
ELUDE: Generating interpretable explanations via a decomposition into labelled and unlabelled features [23.384134043048807]
We develop an explanation framework that decomposes a model's prediction into two parts. By identifying the latter, we are able to analyze the "unexplained" portion of the model. We show that the set of unlabelled features can generalize to multiple models trained with the same feature space.
arXiv Detail & Related papers (2022-06-15T17:36:55Z)
Model-agnostic multi-objective approach for the evolutionary discovery of mathematical models [55.41644538483948]
In modern data science, it is more interesting to understand the properties of the model, which parts could be replaced to obtain better results. We use multi-objective evolutionary optimization for composite data-driven model learning to obtain the algorithm's desired properties.
arXiv Detail & Related papers (2021-07-07T11:17:09Z)
The Definitions of Interpretability and Learning of Interpretable Models [42.22982369082474]
We propose a mathematical definition for the human-interpretable model. If a prediction model is interpretable by a human recognition system, the prediction model is defined as a completely human-interpretable model.
arXiv Detail & Related papers (2021-05-29T01:44:12Z)
Model Learning with Personalized Interpretability Estimation (ML-PIE) [2.862606936691229]
High-stakes applications require AI-generated models to be interpretable. Current algorithms for the synthesis of potentially interpretable models rely on objectives or regularization terms. We propose an approach for the synthesis of models that are tailored to the user.
arXiv Detail & Related papers (2021-04-13T09:47:48Z)
Generative Counterfactuals for Neural Networks via Attribute-Informed Perturbation [51.29486247405601]
We design a framework to generate counterfactuals for raw data instances with the proposed Attribute-Informed Perturbation (AIP) By utilizing generative models conditioned with different attributes, counterfactuals with desired labels can be obtained effectively and efficiently. Experimental results on real-world texts and images demonstrate the effectiveness, sample quality as well as efficiency of our designed framework.
arXiv Detail & Related papers (2021-01-18T08:37:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.