Related papers: Hierarchical Semantic Tree Concept Whitening for Interpretable Image Classification

Hierarchical Semantic Tree Concept Whitening for Interpretable Image Classification

URL: http://arxiv.org/abs/2307.04343v1
Date: Mon, 10 Jul 2023 04:54:05 GMT
Title: Hierarchical Semantic Tree Concept Whitening for Interpretable Image Classification
Authors: Haixing Dai, Lu Zhang, Lin Zhao, Zihao Wu, Zhengliang Liu, David Liu, Xiaowei Yu, Yanjun Lyu, Changying Li, Ninghao Liu, Tianming Liu, Dajiang Zhu
Abstract summary: Post-hoc analysis can only discover the patterns or rules that naturally exist in models. We proactively instill knowledge to alter the representation of human-understandable concepts in hidden layers. Our method improves model interpretability, showing better disentanglement of semantic concepts, without negatively affecting model classification performance.
Score: 19.306487616731765
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the popularity of deep neural networks (DNNs), model interpretability is becoming a critical concern. Many approaches have been developed to tackle the problem through post-hoc analysis, such as explaining how predictions are made or understanding the meaning of neurons in middle layers. Nevertheless, these methods can only discover the patterns or rules that naturally exist in models. In this work, rather than relying on post-hoc schemes, we proactively instill knowledge to alter the representation of human-understandable concepts in hidden layers. Specifically, we use a hierarchical tree of semantic concepts to store the knowledge, which is leveraged to regularize the representations of image data instances while training deep models. The axes of the latent space are aligned with the semantic concepts, where the hierarchical relations between concepts are also preserved. Experiments on real-world image datasets show that our method improves model interpretability, showing better disentanglement of semantic concepts, without negatively affecting model classification performance.

Related papers

On the Performance of Concept Probing: The Influence of the Data (Extended Version) [3.2443914909457594]
Concept probing works by training additional classifiers to map the internal representations of a model into human-defined concepts of interest.<n>Research on concept probing has mainly focused on the model being probed or the probing model itself.<n>In this paper, we investigate the effect of the data used to train probing models on their performance.
arXiv Detail & Related papers (2025-07-24T16:18:46Z)
Concept-Based Mechanistic Interpretability Using Structured Knowledge Graphs [3.429783703166407]
Our framework enables a global dissection of model behavior by analyzing how high-level semantic attributes emerge, interact, and propagate through internal model components.<n>A key innovation is our visualization platform that we named BAGEL, which presents these insights in a structured knowledge graph.<n>Our framework is model-agnostic, scalable, and contributes to a deeper understanding of how deep learning models generalize (or fail to) in the presence of dataset biases.
arXiv Detail & Related papers (2025-07-08T09:30:20Z)
Concept-Guided Interpretability via Neural Chunking [54.73787666584143]
We show that neural networks exhibit patterns in their raw population activity that mirror regularities in the training data.<n>We propose three methods to extract these emerging entities, complementing each other based on label availability and dimensionality.<n>Our work points to a new direction for interpretability, one that harnesses both cognitive principles and the structure of naturalistic data.
arXiv Detail & Related papers (2025-05-16T13:49:43Z)
Show and Tell: Visually Explainable Deep Neural Nets via Spatially-Aware Concept Bottleneck Models [5.985204759362746]
We present a unified framework for transforming any vision neural network into a spatially and conceptually interpretable model. We name this method "Spatially-Aware and Label-Free Concept Bottleneck Model" (SALF-CBM)
arXiv Detail & Related papers (2025-02-27T14:27:55Z)
Language Model as Visual Explainer [72.88137795439407]
We present a systematic approach for interpreting vision models using a tree-structured linguistic explanation. Our method provides human-understandable explanations in the form of attribute-laden trees. To access the effectiveness of our approach, we introduce new benchmarks and conduct rigorous evaluations.
arXiv Detail & Related papers (2024-12-08T20:46:23Z)
Explainable Concept Generation through Vision-Language Preference Learning for Understanding Neural Networks' Internal Representations [7.736445799116692]
Concept-based methods have become a popular choice for explaining deep neural networks post-hoc.<n>We devise a reinforcement learning-based preference optimization algorithm that fine-tunes a vision-language generative model.<n>We demonstrate our method's ability to efficiently and reliably articulate diverse concepts.
arXiv Detail & Related papers (2024-08-24T02:26:42Z)
Restyling Unsupervised Concept Based Interpretable Networks with Generative Models [14.604305230535026]
We propose a novel method that relies on mapping the concept features to the latent space of a pretrained generative model. We quantitatively ascertain the efficacy of our method in terms of accuracy of the interpretable prediction network, fidelity of reconstruction, as well as faithfulness and consistency of learnt concepts.
arXiv Detail & Related papers (2024-07-01T14:39:41Z)
Learning Discrete Concepts in Latent Hierarchical Models [73.01229236386148]
Learning concepts from natural high-dimensional data holds potential in building human-aligned and interpretable machine learning models. We formalize concepts as discrete latent causal variables that are related via a hierarchical causal model. We substantiate our theoretical claims with synthetic data experiments.
arXiv Detail & Related papers (2024-06-01T18:01:03Z)
ClassDiffusion: More Aligned Personalization Tuning with Explicit Class Guidance [78.44823280247438]
We present ClassDiffusion, a technique that leverages a semantic preservation loss to explicitly regulate the concept space when learning the new concept. Despite its simplicity, this helps avoid semantic drift when fine-tuning on the target concepts. In response to the ineffective evaluation of CLIP-T metrics, we introduce BLIP2-T metric.
arXiv Detail & Related papers (2024-05-27T17:50:10Z)
Understanding Distributed Representations of Concepts in Deep Neural Networks without Supervision [25.449397570387802]
We propose an unsupervised method for discovering distributed representations of concepts by selecting a principal subset of neurons. Our empirical findings demonstrate that instances with similar neuron activation states tend to share coherent concepts. It can be utilized to identify unlabeled subclasses within data and to detect the causes of misclassifications.
arXiv Detail & Related papers (2023-12-28T07:33:51Z)
Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks. The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation. We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z)
Implicit Concept Removal of Diffusion Models [92.55152501707995]
Text-to-image (T2I) diffusion models often inadvertently generate unwanted concepts such as watermarks and unsafe images. We present the Geom-Erasing, a novel concept removal method based on the geometric-driven control.
arXiv Detail & Related papers (2023-10-09T17:13:10Z)
Attributing Learned Concepts in Neural Networks to Training Data [5.930268338525991]
We find evidence for convergence, where removing the 10,000 top attributing images for a concept and retraining the model does not change the location of the concept in the network. This suggests that the features that inform the development of a concept are spread in a more diffuse manner across its exemplars, implying robustness in concept formation.
arXiv Detail & Related papers (2023-10-04T20:26:59Z)
A Recursive Bateson-Inspired Model for the Generation of Semantic Formal Concepts from Spatial Sensory Data [77.34726150561087]
This paper presents a new symbolic-only method for the generation of hierarchical concept structures from complex sensory data. The approach is based on Bateson's notion of difference as the key to the genesis of an idea or a concept. The model is able to produce fairly rich yet human-readable conceptual representations without training.
arXiv Detail & Related papers (2023-07-16T15:59:13Z)
Counterfactual Generative Networks [59.080843365828756]
We propose to decompose the image generation process into independent causal mechanisms that we train without direct supervision. By exploiting appropriate inductive biases, these mechanisms disentangle object shape, object texture, and background. We show that the counterfactual images can improve out-of-distribution with a marginal drop in performance on the original classification task.
arXiv Detail & Related papers (2021-01-15T10:23:12Z)
MACE: Model Agnostic Concept Extractor for Explaining Image Classification Networks [10.06397994266945]
We propose MACE: a Model Agnostic Concept Extractor, which can explain the working of a convolutional network through smaller concepts. We validate our framework using VGG16 and ResNet50 CNN architectures, and on datasets like Animals With Attributes 2 (AWA2) and Places365.
arXiv Detail & Related papers (2020-11-03T04:40:49Z)
Abstracting Deep Neural Networks into Concept Graphs for Concept Level Interpretability [0.39635467316436124]
We attempt to understand the behavior of trained models that perform image processing tasks in the medical domain by building a graphical representation of the concepts they learn. We show the application of our proposed implementation on two biomedical problems - brain tumor segmentation and fundus image classification.
arXiv Detail & Related papers (2020-08-14T16:34:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.