Related papers: Unsupervised Interpretable Basis Extraction for Concept-Based Visual Explanations

Unsupervised Interpretable Basis Extraction for Concept-Based Visual Explanations

URL: http://arxiv.org/abs/2303.10523v3
Date: Mon, 22 Sep 2025 11:09:26 GMT
Title: Unsupervised Interpretable Basis Extraction for Concept-Based Visual Explanations
Authors: Alexandros Doumanoglou, Stylianos Asteriadis, Dimitrios Zarpalas,
Abstract summary: This work attempts to explain CNN image classifier predictions and intermediate layer representations in terms of human-understandable concepts.<n>We take a bottom-up approach, identifying the directions from the structure of the feature space, collectively, without relying on supervision from concept labels.<n>We make extensions to existing basis interpretability metrics and show that intermediate layer representations become more interpretable when transformed with the extracted bases.
Score: 44.033369364364084
License: http://creativecommons.org/licenses/by/4.0/
Abstract: An important line of research attempts to explain CNN image classifier predictions and intermediate layer representations in terms of human-understandable concepts. Previous work supports that deep representations are linearly separable with respect to their concept label, implying that the feature space has directions where intermediate representations may be projected onto, to become more understandable. These directions are called interpretable, and when considered as a set, they may form an interpretable feature space basis. Compared to previous top-down probing approaches which use concept annotations to identify the interpretable directions one at a time, in this work, we take a bottom-up approach, identifying the directions from the structure of the feature space, collectively, without relying on supervision from concept labels. Instead, we learn the directions by optimizing for a sparsity property that holds for any interpretable basis. We experiment with existing popular CNNs and demonstrate the effectiveness of our method in extracting an interpretable basis across network architectures and training datasets. We make extensions to existing basis interpretability metrics and show that intermediate layer representations become more interpretable when transformed with the extracted bases. Finally, we compare the bases extracted with our method with the bases derived with supervision and find that, in one aspect, unsupervised basis extraction has a strength that constitutes a limitation of learning the basis with supervision, and we provide potential directions for future research.

Related papers

Emergent Structured Representations Support Flexible In-Context Inference in Large Language Models [77.98801218316505]
Large language models (LLMs) exhibit emergent behaviors suggestive of human-like reasoning.<n>We investigate the internal processing of LLMs during in-context concept inference.
arXiv Detail & Related papers (2026-02-08T03:14:39Z)
Insight: Interpretable Semantic Hierarchies in Vision-Language Encoders [52.94006363830628]
Language-aligned vision foundation models perform strongly across diverse downstream tasks.<n>Recent works decompose these representations into human-interpretable concepts, but provide poor spatial grounding and are limited to image classification tasks.<n>We propose Insight, a language-aligned concept foundation model that provides fine-grained concepts, which are human-interpretable and spatially grounded in the input image.
arXiv Detail & Related papers (2026-01-20T09:57:26Z)
Preserving Clusters in Prompt Learning for Unsupervised Domain Adaptation [29.809079908218607]
This work introduces a fresh solution to reinforce base pseudo-labels and facilitate target-prompt learning.<n>We first propose to leverage the reference predictions based on the relationship between source and target visual embeddings.<n>We later show that there is a strong clustering behavior observed between visual and text embeddings in pre-trained multi-modal models.
arXiv Detail & Related papers (2025-06-13T06:33:27Z)
Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing [62.447497430479174]
Drawing to reason in space is a novel paradigm that enables LVLMs to reason through elementary drawing operations in the visual space.<n>Our model, named VILASR, consistently outperforms existing methods across diverse spatial reasoning benchmarks.
arXiv Detail & Related papers (2025-06-11T17:41:50Z)
Decoding Diffusion: A Scalable Framework for Unsupervised Analysis of Latent Space Biases and Representations Using Natural Language Prompts [68.48103545146127]
This paper proposes a novel framework for unsupervised exploration of diffusion latent spaces. We directly leverage natural language prompts and image captions to map latent directions. Our method provides a more scalable and interpretable understanding of the semantic knowledge encoded within diffusion models.
arXiv Detail & Related papers (2024-10-25T21:44:51Z)
Advancing Ante-Hoc Explainable Models through Generative Adversarial Networks [24.45212348373868]
This paper presents a novel concept learning framework for enhancing model interpretability and performance in visual classification tasks. Our approach appends an unsupervised explanation generator to the primary classifier network and makes use of adversarial training. This work presents a significant step towards building inherently interpretable deep vision models with task-aligned concept representations.
arXiv Detail & Related papers (2024-01-09T16:16:16Z)
Causal Unsupervised Semantic Segmentation [60.178274138753174]
Unsupervised semantic segmentation aims to achieve high-quality semantic grouping without human-labeled annotations. We propose a novel framework, CAusal Unsupervised Semantic sEgmentation (CAUSE), which leverages insights from causal inference.
arXiv Detail & Related papers (2023-10-11T10:54:44Z)
Rewrite Caption Semantics: Bridging Semantic Gaps for Language-Supervised Semantic Segmentation [100.81837601210597]
We propose Concept Curation (CoCu) to bridge the gap between visual and textual semantics in pre-training data. CoCu achieves superb zero-shot transfer performance and greatly boosts language-supervised segmentation baseline by a large margin.
arXiv Detail & Related papers (2023-09-24T00:05:39Z)
Uncovering Unique Concept Vectors through Latent Space Decomposition [0.0]
Concept-based explanations have emerged as a superior approach that is more interpretable than feature attribution estimates. We propose a novel post-hoc unsupervised method that automatically uncovers the concepts learned by deep models during training. Our experiments reveal that the majority of our concepts are readily understandable to humans, exhibit coherency, and bear relevance to the task at hand.
arXiv Detail & Related papers (2023-07-13T17:21:54Z)
Explaining Explainability: Towards Deeper Actionable Insights into Deep Learning through Second-order Explainability [70.60433013657693]
Second-order explainable AI (SOXAI) was recently proposed to extend explainable AI (XAI) from the instance level to the dataset level. We demonstrate for the first time, via example classification and segmentation cases, that eliminating irrelevant concepts from the training set based on actionable insights from SOXAI can enhance a model's performance.
arXiv Detail & Related papers (2023-06-14T23:24:01Z)
Adversarial Attacks on the Interpretation of Neuron Activation Maximization [70.5472799454224]
Activation-maximization approaches are used to interpret and analyze trained deep-learning models. In this work, we consider the concept of an adversary manipulating a model for the purpose of deceiving the interpretation.
arXiv Detail & Related papers (2023-06-12T19:54:33Z)
From Patches to Objects: Exploiting Spatial Reasoning for Better Visual Representations [2.363388546004777]
We propose a novel auxiliary pretraining method that is based on spatial reasoning. Our proposed method takes advantage of a more flexible formulation of contrastive learning by introducing spatial reasoning as an auxiliary task for discriminative self-supervised methods.
arXiv Detail & Related papers (2023-05-21T07:46:46Z)
Revealing Hidden Context Bias in Segmentation and Object Detection through Concept-specific Explanations [14.77637281844823]
We propose the post-hoc eXplainable Artificial Intelligence method L-CRP to generate explanations that automatically identify and visualize relevant concepts learned, recognized and used by the model during inference as well as precisely locate them in input space. We verify the faithfulness of our proposed technique by quantitatively comparing different concept attribution methods, and discuss the effect on explanation complexity on popular datasets such as CityScapes, Pascal VOC and MS COCO 2017.
arXiv Detail & Related papers (2022-11-21T13:12:23Z)
Discriminative Attribution from Counterfactuals [64.94009515033984]
We present a method for neural network interpretability by combining feature attribution with counterfactual explanations. We show that this method can be used to quantitatively evaluate the performance of feature attribution methods in an objective manner.
arXiv Detail & Related papers (2021-09-28T00:53:34Z)
Self-supervised Segmentation via Background Inpainting [96.10971980098196]
We introduce a self-supervised detection and segmentation approach that can work with single images captured by a potentially moving camera. We exploit a self-supervised loss function that we exploit to train a proposal-based segmentation network. We apply our method to human detection and segmentation in images that visually depart from those of standard benchmarks and outperform existing self-supervised methods.
arXiv Detail & Related papers (2020-11-11T08:34:40Z)
Explaining Convolutional Neural Networks through Attribution-Based Input Sampling and Block-Wise Feature Aggregation [22.688772441351308]
Methods based on class activation mapping and randomized input sampling have gained great popularity. However, the attribution methods provide lower resolution and blurry explanation maps that limit their explanation power. In this work, we collect visualization maps from multiple layers of the model based on an attribution-based input sampling technique. We also propose a layer selection strategy that applies to the whole family of CNN-based models.
arXiv Detail & Related papers (2020-10-01T20:27:30Z)
Interpretable Representations in Explainable AI: From Theory to Practice [7.031336702345381]
Interpretable representations are the backbone of many explainers that target black-box predictive systems. We study properties of interpretable representations that encode presence and absence of human-comprehensible concepts.
arXiv Detail & Related papers (2020-08-16T21:44:03Z)
Closed-Form Factorization of Latent Semantics in GANs [65.42778970898534]
A rich set of interpretable dimensions has been shown to emerge in the latent space of the Generative Adversarial Networks (GANs) trained for synthesizing images. In this work, we examine the internal representation learned by GANs to reveal the underlying variation factors in an unsupervised manner. We propose a closed-form factorization algorithm for latent semantic discovery by directly decomposing the pre-trained weights.
arXiv Detail & Related papers (2020-07-13T18:05:36Z)
Ontology-based Interpretable Machine Learning for Textual Data [35.01650633374998]
We introduce a novel interpreting framework that learns an interpretable model based on sampling technique to explain prediction models. To narrow down the search space for explanations, we design a learnable anchor algorithm. A set of regulations is further introduced, regarding combining learned interpretable representations with anchors to generate comprehensible explanations.
arXiv Detail & Related papers (2020-04-01T02:51:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.