Related papers: DISSECT: Disentangled Simultaneous Explanations via Concept Traversals

DISSECT: Disentangled Simultaneous Explanations via Concept Traversals

URL: http://arxiv.org/abs/2105.15164v1
Date: Mon, 31 May 2021 17:11:56 GMT
Title: DISSECT: Disentangled Simultaneous Explanations via Concept Traversals
Authors: Asma Ghandeharioun, Been Kim, Chun-Liang Li, Brendan Jou, Brian Eoff, Rosalind W. Picard
Abstract summary: DISSECT is a novel approach to explaining deep learning model inferences. By training a generative model from a classifier's signal, DISSECT offers a way to discover a classifier's inherent "notion" of distinct concepts. We show that DISSECT produces CTs that disentangle several concepts and are coupled to its reasoning due to joint training.
Score: 33.65478845353047
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Explaining deep learning model inferences is a promising venue for scientific understanding, improving safety, uncovering hidden biases, evaluating fairness, and beyond, as argued by many scholars. One of the principal benefits of counterfactual explanations is allowing users to explore "what-if" scenarios through what does not and cannot exist in the data, a quality that many other forms of explanation such as heatmaps and influence functions are inherently incapable of doing. However, most previous work on generative explainability cannot disentangle important concepts effectively, produces unrealistic examples, or fails to retain relevant information. We propose a novel approach, DISSECT, that jointly trains a generator, a discriminator, and a concept disentangler to overcome such challenges using little supervision. DISSECT generates Concept Traversals (CTs), defined as a sequence of generated examples with increasing degrees of concepts that influence a classifier's decision. By training a generative model from a classifier's signal, DISSECT offers a way to discover a classifier's inherent "notion" of distinct concepts automatically rather than rely on user-predefined concepts. We show that DISSECT produces CTs that (1) disentangle several concepts, (2) are influential to a classifier's decision and are coupled to its reasoning due to joint training (3), are realistic, (4) preserve relevant information, and (5) are stable across similar inputs. We validate DISSECT on several challenging synthetic and realistic datasets where previous methods fall short of satisfying desirable criteria for interpretability and show that it performs consistently well and better than existing methods. Finally, we present experiments showing applications of DISSECT for detecting potential biases of a classifier and identifying spurious artifacts that impact predictions.

Related papers

I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data? [76.15163242945813]
Large language models (LLMs) have led many to conclude that they exhibit a form of intelligence.<n>We introduce a novel generative model that generates tokens on the basis of human-interpretable concepts represented as latent discrete variables.
arXiv Detail & Related papers (2025-03-12T01:21:17Z)
On the Fairness, Diversity and Reliability of Text-to-Image Generative Models [49.60774626839712]
multimodal generative models have sparked critical discussions on their fairness, reliability, and potential for misuse. We propose an evaluation framework designed to assess model reliability through their responses to perturbations in the embedding space. Our method lays the groundwork for detecting unreliable, bias-injected models and retrieval of bias provenance.
arXiv Detail & Related papers (2024-11-21T09:46:55Z)
Explaining Explainability: Understanding Concept Activation Vectors [35.37586279472797]
Recent interpretability methods propose using concept-based explanations to translate internal representations of deep learning models into a language that humans are familiar with: concepts. This requires understanding which concepts are present in the representation space of a neural network. In this work, we investigate three properties of Concept Activation Vectors (CAVs), which are learnt using a probe dataset of concept exemplars. We introduce tools designed to detect the presence of these properties, provide insight into how they affect the derived explanations, and provide recommendations to minimise their impact.
arXiv Detail & Related papers (2024-04-04T17:46:20Z)
Advancing Counterfactual Inference through Nonlinear Quantile Regression [77.28323341329461]
We propose a framework for efficient and effective counterfactual inference implemented with neural networks. The proposed approach enhances the capacity to generalize estimated counterfactual outcomes to unseen data. Empirical results conducted on multiple datasets offer compelling support for our theoretical assertions.
arXiv Detail & Related papers (2023-06-09T08:30:51Z)
Statistically Significant Concept-based Explanation of Image Classifiers via Model Knockoffs [22.576922942465142]
Concept-based explanations may cause false positives, which misregards unrelated concepts as important for the prediction task. We propose a method using a deep learning model to learn the image concept and then using the Knockoff samples to select the important concepts for prediction.
arXiv Detail & Related papers (2023-05-27T05:40:05Z)
Dual Box Embeddings for the Description Logic EL++ [16.70961576041243]
Similar to Knowledge Graphs (KGs), Knowledge Graphs are often incomplete, and maintaining and constructing them has proved challenging. Similar to KGs, a promising approach is to learn embeddings in a latent vector space, while additionally ensuring they adhere to the semantics of the underlying DL. We propose a novel ontology embedding method named Box$2$EL for the DL EL++, which represents both concepts and roles as boxes.
arXiv Detail & Related papers (2023-01-26T14:13:37Z)
Probing Classifiers are Unreliable for Concept Removal and Detection [18.25734277357466]
Neural network models trained on text data have been found to encode undesirable linguistic or sensitive concepts in their representation. Recent work has proposed post-hoc and adversarial methods to remove such unwanted concepts from a model's representation. We show that these methods can be counter-productive, and in the worst case may end up destroying all task-relevant features.
arXiv Detail & Related papers (2022-07-08T23:15:26Z)
MACE: An Efficient Model-Agnostic Framework for Counterfactual Explanation [132.77005365032468]
We propose a novel framework of Model-Agnostic Counterfactual Explanation (MACE) In our MACE approach, we propose a novel RL-based method for finding good counterfactual examples and a gradient-less descent method for improving proximity. Experiments on public datasets validate the effectiveness with better validity, sparsity and proximity.
arXiv Detail & Related papers (2022-05-31T04:57:06Z)
Exploring the Trade-off between Plausibility, Change Intensity and Adversarial Power in Counterfactual Explanations using Multi-objective Optimization [73.89239820192894]
We argue that automated counterfactual generation should regard several aspects of the produced adversarial instances. We present a novel framework for the generation of counterfactual examples.
arXiv Detail & Related papers (2022-05-20T15:02:53Z)
Provable concept learning for interpretable predictions using variational inference [7.0349768355860895]
In safety critical applications, practitioners are reluctant to trust neural networks when no interpretable explanations are available. We propose a probabilistic modeling framework to derive (C)oncept (L)earning and (P)rediction (CLAP) We prove that our method is able to identify them while attaining optimal classification accuracy.
arXiv Detail & Related papers (2022-04-01T14:51:38Z)
Beyond Trivial Counterfactual Explanations with Diverse Valuable Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction. We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss. Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z)
Explainable Recommender Systems via Resolving Learning Representations [57.24565012731325]
Explanations could help improve user experience and discover system defects. We propose a novel explainable recommendation model through improving the transparency of the representation learning process.
arXiv Detail & Related papers (2020-08-21T05:30:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.