Uncovering Unique Concept Vectors through Latent Space Decomposition
- URL: http://arxiv.org/abs/2307.06913v2
- Date: Fri, 14 Jul 2023 09:17:19 GMT
- Title: Uncovering Unique Concept Vectors through Latent Space Decomposition
- Authors: Mara Graziani, Laura O' Mahony, An-Phi Nguyen, Henning M\"uller,
Vincent Andrearczyk
- Abstract summary: Concept-based explanations have emerged as a superior approach that is more interpretable than feature attribution estimates.
We propose a novel post-hoc unsupervised method that automatically uncovers the concepts learned by deep models during training.
Our experiments reveal that the majority of our concepts are readily understandable to humans, exhibit coherency, and bear relevance to the task at hand.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Interpreting the inner workings of deep learning models is crucial for
establishing trust and ensuring model safety. Concept-based explanations have
emerged as a superior approach that is more interpretable than feature
attribution estimates such as pixel saliency. However, defining the concepts
for the interpretability analysis biases the explanations by the user's
expectations on the concepts. To address this, we propose a novel post-hoc
unsupervised method that automatically uncovers the concepts learned by deep
models during training. By decomposing the latent space of a layer in singular
vectors and refining them by unsupervised clustering, we uncover concept
vectors aligned with directions of high variance that are relevant to the model
prediction, and that point to semantically distinct concepts. Our extensive
experiments reveal that the majority of our concepts are readily understandable
to humans, exhibit coherency, and bear relevance to the task at hand. Moreover,
we showcase the practical utility of our method in dataset exploration, where
our concept vectors successfully identify outlier training samples affected by
various confounding factors. This novel exploration technique has remarkable
versatility to data types and model architectures and it will facilitate the
identification of biases and the discovery of sources of error within training
data.
Related papers
- Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond [61.18736646013446]
In pursuit of a deeper understanding of its surprising behaviors, we investigate the utility of a simple yet accurate model of a trained neural network.
Across three case studies, we illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena.
arXiv Detail & Related papers (2024-10-31T22:54:34Z) - CoLiDR: Concept Learning using Aggregated Disentangled Representations [29.932706137805713]
Interpretability of Deep Neural Networks using concept-based models offers a promising way to explain model behavior through human-understandable concepts.
A parallel line of research focuses on disentangling the data distribution into its underlying generative factors, in turn explaining the data generation process.
While both directions have received extensive attention, little work has been done on explaining concepts in terms of generative factors to unify mathematically disentangled representations and human-understandable concepts.
arXiv Detail & Related papers (2024-07-27T16:55:14Z) - Advancing Ante-Hoc Explainable Models through Generative Adversarial Networks [24.45212348373868]
This paper presents a novel concept learning framework for enhancing model interpretability and performance in visual classification tasks.
Our approach appends an unsupervised explanation generator to the primary classifier network and makes use of adversarial training.
This work presents a significant step towards building inherently interpretable deep vision models with task-aligned concept representations.
arXiv Detail & Related papers (2024-01-09T16:16:16Z) - Understanding Distributed Representations of Concepts in Deep Neural
Networks without Supervision [25.449397570387802]
We propose an unsupervised method for discovering distributed representations of concepts by selecting a principal subset of neurons.
Our empirical findings demonstrate that instances with similar neuron activation states tend to share coherent concepts.
It can be utilized to identify unlabeled subclasses within data and to detect the causes of misclassifications.
arXiv Detail & Related papers (2023-12-28T07:33:51Z) - Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks.
The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation.
We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z) - Attributing Learned Concepts in Neural Networks to Training Data [5.930268338525991]
We find evidence for convergence, where removing the 10,000 top attributing images for a concept and retraining the model does not change the location of the concept in the network.
This suggests that the features that inform the development of a concept are spread in a more diffuse manner across its exemplars, implying robustness in concept formation.
arXiv Detail & Related papers (2023-10-04T20:26:59Z) - Unsupervised Interpretable Basis Extraction for Concept-Based Visual
Explanations [53.973055975918655]
We show that, intermediate layer representations become more interpretable when transformed to the bases extracted with our method.
We compare the bases extracted with our method with the bases derived with a supervised approach and find that, in one aspect, the proposed unsupervised approach has a strength that constitutes a limitation of the supervised one and give potential directions for future research.
arXiv Detail & Related papers (2023-03-19T00:37:19Z) - Discovering Concepts in Learned Representations using Statistical
Inference and Interactive Visualization [0.76146285961466]
Concept discovery is important for bridging the gap between non-deep learning experts and model end-users.
Current approaches include hand-crafting concept datasets and then converting them to latent space directions.
In this study, we offer another two approaches to guide user discovery of meaningful concepts, one based on multiple hypothesis testing, and another on interactive visualization.
arXiv Detail & Related papers (2022-02-09T22:29:48Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - Concept Learners for Few-Shot Learning [76.08585517480807]
We propose COMET, a meta-learning method that improves generalization ability by learning to learn along human-interpretable concept dimensions.
We evaluate our model on few-shot tasks from diverse domains, including fine-grained image classification, document categorization and cell type annotation.
arXiv Detail & Related papers (2020-07-14T22:04:17Z) - Plausible Counterfactuals: Auditing Deep Learning Classifiers with
Realistic Adversarial Examples [84.8370546614042]
Black-box nature of Deep Learning models has posed unanswered questions about what they learn from data.
Generative Adversarial Network (GAN) and multi-objectives are used to furnish a plausible attack to the audited model.
Its utility is showcased within a human face classification task, unveiling the enormous potential of the proposed framework.
arXiv Detail & Related papers (2020-03-25T11:08:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.