Hierarchical Semantic Tree Concept Whitening for Interpretable Image
Classification
- URL: http://arxiv.org/abs/2307.04343v1
- Date: Mon, 10 Jul 2023 04:54:05 GMT
- Title: Hierarchical Semantic Tree Concept Whitening for Interpretable Image
Classification
- Authors: Haixing Dai, Lu Zhang, Lin Zhao, Zihao Wu, Zhengliang Liu, David Liu,
Xiaowei Yu, Yanjun Lyu, Changying Li, Ninghao Liu, Tianming Liu, Dajiang Zhu
- Abstract summary: Post-hoc analysis can only discover the patterns or rules that naturally exist in models.
We proactively instill knowledge to alter the representation of human-understandable concepts in hidden layers.
Our method improves model interpretability, showing better disentanglement of semantic concepts, without negatively affecting model classification performance.
- Score: 19.306487616731765
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the popularity of deep neural networks (DNNs), model interpretability is
becoming a critical concern. Many approaches have been developed to tackle the
problem through post-hoc analysis, such as explaining how predictions are made
or understanding the meaning of neurons in middle layers. Nevertheless, these
methods can only discover the patterns or rules that naturally exist in models.
In this work, rather than relying on post-hoc schemes, we proactively instill
knowledge to alter the representation of human-understandable concepts in
hidden layers. Specifically, we use a hierarchical tree of semantic concepts to
store the knowledge, which is leveraged to regularize the representations of
image data instances while training deep models. The axes of the latent space
are aligned with the semantic concepts, where the hierarchical relations
between concepts are also preserved. Experiments on real-world image datasets
show that our method improves model interpretability, showing better
disentanglement of semantic concepts, without negatively affecting model
classification performance.
Related papers
- Restyling Unsupervised Concept Based Interpretable Networks with Generative Models [14.604305230535026]
We propose a novel method that relies on mapping the concept features to the latent space of a pretrained generative model.
We quantitatively ascertain the efficacy of our method in terms of accuracy of the interpretable prediction network, fidelity of reconstruction, as well as faithfulness and consistency of learnt concepts.
arXiv Detail & Related papers (2024-07-01T14:39:41Z) - Learning Discrete Concepts in Latent Hierarchical Models [73.01229236386148]
Learning concepts from natural high-dimensional data holds potential in building human-aligned and interpretable machine learning models.
We formalize concepts as discrete latent causal variables that are related via a hierarchical causal model.
We substantiate our theoretical claims with synthetic data experiments.
arXiv Detail & Related papers (2024-06-01T18:01:03Z) - ClassDiffusion: More Aligned Personalization Tuning with Explicit Class Guidance [78.44823280247438]
We present ClassDiffusion, a technique that leverages a semantic preservation loss to explicitly regulate the concept space when learning the new concept.
Despite its simplicity, this helps avoid semantic drift when fine-tuning on the target concepts.
In response to the ineffective evaluation of CLIP-T metrics, we introduce BLIP2-T metric.
arXiv Detail & Related papers (2024-05-27T17:50:10Z) - Understanding Distributed Representations of Concepts in Deep Neural
Networks without Supervision [25.449397570387802]
We propose an unsupervised method for discovering distributed representations of concepts by selecting a principal subset of neurons.
Our empirical findings demonstrate that instances with similar neuron activation states tend to share coherent concepts.
It can be utilized to identify unlabeled subclasses within data and to detect the causes of misclassifications.
arXiv Detail & Related papers (2023-12-28T07:33:51Z) - Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks.
The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation.
We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z) - Implicit Concept Removal of Diffusion Models [92.55152501707995]
Text-to-image (T2I) diffusion models often inadvertently generate unwanted concepts such as watermarks and unsafe images.
We present the Geom-Erasing, a novel concept removal method based on the geometric-driven control.
arXiv Detail & Related papers (2023-10-09T17:13:10Z) - Attributing Learned Concepts in Neural Networks to Training Data [5.930268338525991]
We find evidence for convergence, where removing the 10,000 top attributing images for a concept and retraining the model does not change the location of the concept in the network.
This suggests that the features that inform the development of a concept are spread in a more diffuse manner across its exemplars, implying robustness in concept formation.
arXiv Detail & Related papers (2023-10-04T20:26:59Z) - A Recursive Bateson-Inspired Model for the Generation of Semantic Formal
Concepts from Spatial Sensory Data [77.34726150561087]
This paper presents a new symbolic-only method for the generation of hierarchical concept structures from complex sensory data.
The approach is based on Bateson's notion of difference as the key to the genesis of an idea or a concept.
The model is able to produce fairly rich yet human-readable conceptual representations without training.
arXiv Detail & Related papers (2023-07-16T15:59:13Z) - Counterfactual Generative Networks [59.080843365828756]
We propose to decompose the image generation process into independent causal mechanisms that we train without direct supervision.
By exploiting appropriate inductive biases, these mechanisms disentangle object shape, object texture, and background.
We show that the counterfactual images can improve out-of-distribution with a marginal drop in performance on the original classification task.
arXiv Detail & Related papers (2021-01-15T10:23:12Z) - MACE: Model Agnostic Concept Extractor for Explaining Image
Classification Networks [10.06397994266945]
We propose MACE: a Model Agnostic Concept Extractor, which can explain the working of a convolutional network through smaller concepts.
We validate our framework using VGG16 and ResNet50 CNN architectures, and on datasets like Animals With Attributes 2 (AWA2) and Places365.
arXiv Detail & Related papers (2020-11-03T04:40:49Z) - Abstracting Deep Neural Networks into Concept Graphs for Concept Level
Interpretability [0.39635467316436124]
We attempt to understand the behavior of trained models that perform image processing tasks in the medical domain by building a graphical representation of the concepts they learn.
We show the application of our proposed implementation on two biomedical problems - brain tumor segmentation and fundus image classification.
arXiv Detail & Related papers (2020-08-14T16:34:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.