Related papers: Semantic interpretation for convolutional neural networks: What makes a cat a cat?

Semantic interpretation for convolutional neural networks: What makes a cat a cat?

URL: http://arxiv.org/abs/2204.07724v1
Date: Sat, 16 Apr 2022 05:25:17 GMT
Title: Semantic interpretation for convolutional neural networks: What makes a cat a cat?
Authors: Hao Xu, Yuntian Chen, Dongxiao Zhang
Abstract summary: We introduce the framework of semantic explainable AI (S-XAI) S-XAI uses row-centered principal component analysis to obtain the common traits from the best combination of superpixels discovered by a genetic algorithm. It extracts understandable semantic spaces on the basis of discovered semantically sensitive neurons and visualization techniques.
Score: 3.132595571344153
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The interpretability of deep neural networks has attracted increasing attention in recent years, and several methods have been created to interpret the "black box" model. Fundamental limitations remain, however, that impede the pace of understanding the networks, especially the extraction of understandable semantic space. In this work, we introduce the framework of semantic explainable AI (S-XAI), which utilizes row-centered principal component analysis to obtain the common traits from the best combination of superpixels discovered by a genetic algorithm, and extracts understandable semantic spaces on the basis of discovered semantically sensitive neurons and visualization techniques. Statistical interpretation of the semantic space is also provided, and the concept of semantic probability is proposed for the first time. Our experimental results demonstrate that S-XAI is effective in providing a semantic interpretation for the CNN, and offers broad usage, including trustworthiness assessment and semantic sample searching.

Related papers

Concept-Guided Interpretability via Neural Chunking [54.73787666584143]
We show that neural networks exhibit patterns in their raw population activity that mirror regularities in the training data.<n>We propose three methods to extract these emerging entities, complementing each other based on label availability and dimensionality.<n>Our work points to a new direction for interpretability, one that harnesses both cognitive principles and the structure of naturalistic data.
arXiv Detail & Related papers (2025-05-16T13:49:43Z)
On the Geometry of Semantics in Next-token Prediction [27.33243506775655]
Modern language models capture linguistic meaning despite being trained solely through next-token prediction.<n>We investigate how this conceptually simple training objective leads models to extract and encode latent semantic and grammatical concepts.<n>Our work bridges distributional semantics, neural collapse geometry, and neural network training dynamics, providing insights into how NTP's implicit biases shape the emergence of meaning representations in language models.
arXiv Detail & Related papers (2025-05-13T08:46:04Z)
From superposition to sparse codes: interpretable representations in neural networks [3.6738925004882685]
Recent evidence suggests that neural networks encode features in superposition, meaning that input concepts are linearly overlaid within the network's representations. We present a perspective that explains this phenomenon and provides a foundation for extracting interpretable representations from neural activations. Our arguments have implications for neural coding theories, AI transparency, and the broader goal of making deep learning models more interpretable.
arXiv Detail & Related papers (2025-03-03T18:49:59Z)
Discovering Chunks in Neural Embeddings for Interpretability [53.80157905839065]
We propose leveraging the principle of chunking to interpret artificial neural population activities. We first demonstrate this concept in recurrent neural networks (RNNs) trained on artificial sequences with imposed regularities. We identify similar recurring embedding states corresponding to concepts in the input, with perturbations to these states activating or inhibiting the associated concepts.
arXiv Detail & Related papers (2025-02-03T20:30:46Z)
Closed-Form Interpretation of Neural Network Latent Spaces with Symbolic Gradients [0.0]
We introduce a framework for finding closed-form interpretations of neurons in latent spaces of artificial neural networks. The interpretation framework is based on embedding trained neural networks into an equivalence class of functions that encode the same concept.
arXiv Detail & Related papers (2024-09-09T03:26:07Z)
Understanding polysemanticity in neural networks through coding theory [0.8702432681310401]
We propose a novel practical approach to network interpretability and theoretical insights into polysemanticity and the density of codes. We show how random projections can reveal whether a network exhibits a smooth or non-differentiable code and hence how interpretable the code is. Our approach advances the pursuit of interpretability in neural networks, providing insights into their underlying structure and suggesting new avenues for circuit-level interpretability.
arXiv Detail & Related papers (2024-01-31T16:31:54Z)
Manipulating Feature Visualizations with Gradient Slingshots [54.31109240020007]
We introduce a novel method for manipulating Feature Visualization (FV) without significantly impacting the model's decision-making process. We evaluate the effectiveness of our method on several neural network models and demonstrate its capabilities to hide the functionality of arbitrarily chosen neurons.
arXiv Detail & Related papers (2024-01-11T18:57:17Z)
AS-XAI: Self-supervised Automatic Semantic Interpretation for CNN [5.42467030980398]
We propose a self-supervised automatic semantic interpretable artificial intelligence (AS-XAI) framework. It utilizes transparent embedding semantic extraction spaces and row-centered principal component analysis (PCA) for global semantic interpretation of model decisions. The proposed approach offers broad fine-grained practical applications, including shared semantic interpretation under out-of-distribution categories.
arXiv Detail & Related papers (2023-12-02T10:06:54Z)
Understanding CNN Hidden Neuron Activations Using Structured Background Knowledge and Deductive Reasoning [3.6223658572137825]
State of the art indicates that hidden node activations can, in some cases, be interpretable in a way that makes sense to humans. We show that we can automatically attach meaningful labels from the background knowledge to individual neurons in the dense layer of a Convolutional Neural Network.
arXiv Detail & Related papers (2023-08-08T02:28:50Z)
Adversarial Attacks on the Interpretation of Neuron Activation Maximization [70.5472799454224]
Activation-maximization approaches are used to interpret and analyze trained deep-learning models. In this work, we consider the concept of an adversary manipulating a model for the purpose of deceiving the interpretation.
arXiv Detail & Related papers (2023-06-12T19:54:33Z)
Imitation Learning-based Implicit Semantic-aware Communication Networks: Multi-layer Representation and Collaborative Reasoning [68.63380306259742]
Despite its promising potential, semantic communications and semantic-aware networking are still at their infancy. We propose a novel reasoning-based implicit semantic-aware communication network architecture that allows multiple tiers of CDC and edge servers to collaborate. We introduce a new multi-layer representation of semantic information taking into consideration both the hierarchical structure of implicit semantics as well as the personalized inference preference of individual users.
arXiv Detail & Related papers (2022-10-28T13:26:08Z)
Neuro-Symbolic Artificial Intelligence (AI) for Intent based Semantic Communication [85.06664206117088]
6G networks must consider semantics and effectiveness (at end-user) of the data transmission. NeSy AI is proposed as a pillar for learning causal structure behind the observed data. GFlowNet is leveraged for the first time in a wireless system to learn the probabilistic structure which generates the data.
arXiv Detail & Related papers (2022-05-22T07:11:57Z)
Interpretable part-whole hierarchies and conceptual-semantic relationships in neural networks [4.153804257347222]
We present Agglomerator, a framework capable of providing a representation of part-whole hierarchies from visual cues. We evaluate our method on common datasets, such as SmallNORB, MNIST, FashionMNIST, CIFAR-10, and CIFAR-100.
arXiv Detail & Related papers (2022-03-07T10:56:13Z)
An Interpretability Illusion for BERT [61.2687465308121]
We describe an "interpretability illusion" that arises when analyzing the BERT model. We trace the source of this illusion to geometric properties of BERT's embedding space. We provide a taxonomy of model-learned concepts and discuss methodological implications for interpretability research.
arXiv Detail & Related papers (2021-04-14T22:04:48Z)
A Chain Graph Interpretation of Real-World Neural Networks [58.78692706974121]
We propose an alternative interpretation that identifies NNs as chain graphs (CGs) and feed-forward as an approximate inference procedure. The CG interpretation specifies the nature of each NN component within the rich theoretical framework of probabilistic graphical models. We demonstrate with concrete examples that the CG interpretation can provide novel theoretical support and insights for various NN techniques.
arXiv Detail & Related papers (2020-06-30T14:46:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.