When are Post-hoc Conceptual Explanations Identifiable?
- URL: http://arxiv.org/abs/2206.13872v5
- Date: Tue, 6 Jun 2023 07:01:53 GMT
- Title: When are Post-hoc Conceptual Explanations Identifiable?
- Authors: Tobias Leemann, Michael Kirchhof, Yao Rong, Enkelejda Kasneci, Gjergji
Kasneci
- Abstract summary: When no human concept labels are available, concept discovery methods search trained embedding spaces for interpretable concepts.
We argue that concept discovery should be identifiable, meaning that a number of known concepts can be provably recovered to guarantee reliability of the explanations.
Our results highlight the strict conditions under which reliable concept discovery without human labels can be guaranteed.
- Score: 18.85180188353977
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Interest in understanding and factorizing learned embedding spaces through
conceptual explanations is steadily growing. When no human concept labels are
available, concept discovery methods search trained embedding spaces for
interpretable concepts like object shape or color that can provide post-hoc
explanations for decisions. Unlike previous work, we argue that concept
discovery should be identifiable, meaning that a number of known concepts can
be provably recovered to guarantee reliability of the explanations. As a
starting point, we explicitly make the connection between concept discovery and
classical methods like Principal Component Analysis and Independent Component
Analysis by showing that they can recover independent concepts under
non-Gaussian distributions. For dependent concepts, we propose two novel
approaches that exploit functional compositionality properties of
image-generating processes. Our provably identifiable concept discovery methods
substantially outperform competitors on a battery of experiments including
hundreds of trained models and dependent concepts, where they exhibit up to 29
% better alignment with the ground truth. Our results highlight the strict
conditions under which reliable concept discovery without human labels can be
guaranteed and provide a formal foundation for the domain. Our code is
available online.
Related papers
- Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery [52.498055901649025]
Concept Bottleneck Models (CBMs) have been proposed to address the 'black-box' problem of deep neural networks.
We propose a novel CBM approach -- called Discover-then-Name-CBM (DN-CBM) -- that inverts the typical paradigm.
Our concept extraction strategy is efficient, since it is agnostic to the downstream task, and uses concepts already known to the model.
arXiv Detail & Related papers (2024-07-19T17:50:11Z) - How to Blend Concepts in Diffusion Models [48.68800153838679]
Recent methods exploit multiple latent representations and their connection, making this research question even more entangled.
Our goal is to understand how operations in the latent space affect the underlying concepts.
Our conclusion is that concept blending through space manipulation is possible, although the best strategy depends on the context of the blend.
arXiv Detail & Related papers (2024-07-19T13:05:57Z) - Explaining Explainability: Understanding Concept Activation Vectors [35.37586279472797]
Recent interpretability methods propose using concept-based explanations to translate internal representations of deep learning models into a language that humans are familiar with: concepts.
This requires understanding which concepts are present in the representation space of a neural network.
In this work, we investigate three properties of Concept Activation Vectors (CAVs), which are learnt using a probe dataset of concept exemplars.
We introduce tools designed to detect the presence of these properties, provide insight into how they affect the derived explanations, and provide recommendations to minimise their impact.
arXiv Detail & Related papers (2024-04-04T17:46:20Z) - Estimation of Concept Explanations Should be Uncertainty Aware [39.598213804572396]
We study a specific kind called Concept Explanations, where the goal is to interpret a model using human-understandable concepts.
Although popular for their easy interpretation, concept explanations are known to be noisy.
We propose an uncertainty-aware Bayesian estimation method to address these issues, which readily improved the quality of explanations.
arXiv Detail & Related papers (2023-12-13T11:17:27Z) - Implicit Concept Removal of Diffusion Models [92.55152501707995]
Text-to-image (T2I) diffusion models often inadvertently generate unwanted concepts such as watermarks and unsafe images.
We present the Geom-Erasing, a novel concept removal method based on the geometric-driven control.
arXiv Detail & Related papers (2023-10-09T17:13:10Z) - The Hidden Language of Diffusion Models [70.03691458189604]
We present Conceptor, a novel method to interpret the internal representation of a textual concept by a diffusion model.
We find surprising visual connections between concepts, that transcend their textual semantics.
We additionally discover concepts that rely on mixtures of exemplars, biases, renowned artistic styles, or a simultaneous fusion of multiple meanings.
arXiv Detail & Related papers (2023-06-01T17:57:08Z) - COPEN: Probing Conceptual Knowledge in Pre-trained Language Models [60.10147136876669]
Conceptual knowledge is fundamental to human cognition and knowledge bases.
Existing knowledge probing works only focus on factual knowledge of pre-trained language models (PLMs) and ignore conceptual knowledge.
We design three tasks to probe whether PLMs organize entities by conceptual similarities, learn conceptual properties, and conceptualize entities in contexts.
For the tasks, we collect and annotate 24k data instances covering 393 concepts, which is COPEN, a COnceptual knowledge Probing bENchmark.
arXiv Detail & Related papers (2022-11-08T08:18:06Z) - Concept Activation Regions: A Generalized Framework For Concept-Based
Explanations [95.94432031144716]
Existing methods assume that the examples illustrating a concept are mapped in a fixed direction of the deep neural network's latent space.
In this work, we propose allowing concept examples to be scattered across different clusters in the DNN's latent space.
This concept activation region (CAR) formalism yields global concept-based explanations and local concept-based feature importance.
arXiv Detail & Related papers (2022-09-22T17:59:03Z) - Overlooked factors in concept-based explanations: Dataset choice,
concept learnability, and human capability [25.545486537295144]
Concept-based interpretability methods aim to explain deep neural network model predictions using a predefined set of semantic concepts.
Despite their popularity, they suffer from limitations that are not well-understood and articulated by the literature.
We analyze three commonly overlooked factors in concept-based explanations.
arXiv Detail & Related papers (2022-07-20T01:59:39Z) - SegDiscover: Visual Concept Discovery via Unsupervised Semantic
Segmentation [29.809900593362844]
SegDiscover is a novel framework that discovers semantically meaningful visual concepts from imagery datasets with complex scenes without supervision.
Our method generates concept primitives from raw images, discovering concepts by clustering in the latent space of a self-supervised pretrained encoder, and concept refinement via neural network smoothing.
arXiv Detail & Related papers (2022-04-22T20:44:42Z) - Sparse Subspace Clustering for Concept Discovery (SSCCD) [1.7319807100654885]
Concepts are key building blocks of higher level human understanding.
Local attribution methods do not allow to identify coherent model behavior across samples.
We put forward a new definition of concepts as low-dimensional subspaces of hidden feature layers.
arXiv Detail & Related papers (2022-03-11T16:15:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.