DISSECT: Disentangled Simultaneous Explanations via Concept Traversals
- URL: http://arxiv.org/abs/2105.15164v1
- Date: Mon, 31 May 2021 17:11:56 GMT
- Title: DISSECT: Disentangled Simultaneous Explanations via Concept Traversals
- Authors: Asma Ghandeharioun, Been Kim, Chun-Liang Li, Brendan Jou, Brian Eoff,
Rosalind W. Picard
- Abstract summary: DISSECT is a novel approach to explaining deep learning model inferences.
By training a generative model from a classifier's signal, DISSECT offers a way to discover a classifier's inherent "notion" of distinct concepts.
We show that DISSECT produces CTs that disentangle several concepts and are coupled to its reasoning due to joint training.
- Score: 33.65478845353047
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Explaining deep learning model inferences is a promising venue for scientific
understanding, improving safety, uncovering hidden biases, evaluating fairness,
and beyond, as argued by many scholars. One of the principal benefits of
counterfactual explanations is allowing users to explore "what-if" scenarios
through what does not and cannot exist in the data, a quality that many other
forms of explanation such as heatmaps and influence functions are inherently
incapable of doing. However, most previous work on generative explainability
cannot disentangle important concepts effectively, produces unrealistic
examples, or fails to retain relevant information. We propose a novel approach,
DISSECT, that jointly trains a generator, a discriminator, and a concept
disentangler to overcome such challenges using little supervision. DISSECT
generates Concept Traversals (CTs), defined as a sequence of generated examples
with increasing degrees of concepts that influence a classifier's decision. By
training a generative model from a classifier's signal, DISSECT offers a way to
discover a classifier's inherent "notion" of distinct concepts automatically
rather than rely on user-predefined concepts. We show that DISSECT produces CTs
that (1) disentangle several concepts, (2) are influential to a classifier's
decision and are coupled to its reasoning due to joint training (3), are
realistic, (4) preserve relevant information, and (5) are stable across similar
inputs. We validate DISSECT on several challenging synthetic and realistic
datasets where previous methods fall short of satisfying desirable criteria for
interpretability and show that it performs consistently well and better than
existing methods. Finally, we present experiments showing applications of
DISSECT for detecting potential biases of a classifier and identifying spurious
artifacts that impact predictions.
Related papers
- Explaining Explainability: Understanding Concept Activation Vectors [35.37586279472797]
Recent interpretability methods propose using concept-based explanations to translate internal representations of deep learning models into a language that humans are familiar with: concepts.
This requires understanding which concepts are present in the representation space of a neural network.
In this work, we investigate three properties of Concept Activation Vectors (CAVs), which are learnt using a probe dataset of concept exemplars.
We introduce tools designed to detect the presence of these properties, provide insight into how they affect the derived explanations, and provide recommendations to minimise their impact.
arXiv Detail & Related papers (2024-04-04T17:46:20Z) - Advancing Counterfactual Inference through Nonlinear Quantile Regression [77.28323341329461]
We propose a framework for efficient and effective counterfactual inference implemented with neural networks.
The proposed approach enhances the capacity to generalize estimated counterfactual outcomes to unseen data.
Empirical results conducted on multiple datasets offer compelling support for our theoretical assertions.
arXiv Detail & Related papers (2023-06-09T08:30:51Z) - Statistically Significant Concept-based Explanation of Image Classifiers
via Model Knockoffs [22.576922942465142]
Concept-based explanations may cause false positives, which misregards unrelated concepts as important for the prediction task.
We propose a method using a deep learning model to learn the image concept and then using the Knockoff samples to select the important concepts for prediction.
arXiv Detail & Related papers (2023-05-27T05:40:05Z) - Dual Box Embeddings for the Description Logic EL++ [16.70961576041243]
Similar to Knowledge Graphs (KGs), Knowledge Graphs are often incomplete, and maintaining and constructing them has proved challenging.
Similar to KGs, a promising approach is to learn embeddings in a latent vector space, while additionally ensuring they adhere to the semantics of the underlying DL.
We propose a novel ontology embedding method named Box$2$EL for the DL EL++, which represents both concepts and roles as boxes.
arXiv Detail & Related papers (2023-01-26T14:13:37Z) - Probing Classifiers are Unreliable for Concept Removal and Detection [18.25734277357466]
Neural network models trained on text data have been found to encode undesirable linguistic or sensitive concepts in their representation.
Recent work has proposed post-hoc and adversarial methods to remove such unwanted concepts from a model's representation.
We show that these methods can be counter-productive, and in the worst case may end up destroying all task-relevant features.
arXiv Detail & Related papers (2022-07-08T23:15:26Z) - MACE: An Efficient Model-Agnostic Framework for Counterfactual
Explanation [132.77005365032468]
We propose a novel framework of Model-Agnostic Counterfactual Explanation (MACE)
In our MACE approach, we propose a novel RL-based method for finding good counterfactual examples and a gradient-less descent method for improving proximity.
Experiments on public datasets validate the effectiveness with better validity, sparsity and proximity.
arXiv Detail & Related papers (2022-05-31T04:57:06Z) - Exploring the Trade-off between Plausibility, Change Intensity and
Adversarial Power in Counterfactual Explanations using Multi-objective
Optimization [73.89239820192894]
We argue that automated counterfactual generation should regard several aspects of the produced adversarial instances.
We present a novel framework for the generation of counterfactual examples.
arXiv Detail & Related papers (2022-05-20T15:02:53Z) - Provable concept learning for interpretable predictions using
variational inference [7.0349768355860895]
In safety critical applications, practitioners are reluctant to trust neural networks when no interpretable explanations are available.
We propose a probabilistic modeling framework to derive (C)oncept (L)earning and (P)rediction (CLAP)
We prove that our method is able to identify them while attaining optimal classification accuracy.
arXiv Detail & Related papers (2022-04-01T14:51:38Z) - Explain, Edit, and Understand: Rethinking User Study Design for
Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews.
We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - Explainable Recommender Systems via Resolving Learning Representations [57.24565012731325]
Explanations could help improve user experience and discover system defects.
We propose a novel explainable recommendation model through improving the transparency of the representation learning process.
arXiv Detail & Related papers (2020-08-21T05:30:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.