Related papers: Understanding CNN Hidden Neuron Activations Using Structured Background Knowledge and Deductive Reasoning

Understanding CNN Hidden Neuron Activations Using Structured Background Knowledge and Deductive Reasoning

URL: http://arxiv.org/abs/2308.03999v2
Date: Wed, 9 Aug 2023 15:59:50 GMT
Title: Understanding CNN Hidden Neuron Activations Using Structured Background Knowledge and Deductive Reasoning
Authors: Abhilekha Dalal, Md Kamruzzaman Sarker, Adrita Barua, Eugene Vasserman, Pascal Hitzler
Abstract summary: State of the art indicates that hidden node activations can, in some cases, be interpretable in a way that makes sense to humans. We show that we can automatically attach meaningful labels from the background knowledge to individual neurons in the dense layer of a Convolutional Neural Network.
Score: 3.6223658572137825
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A major challenge in Explainable AI is in correctly interpreting activations of hidden neurons: accurate interpretations would provide insights into the question of what a deep learning system has internally detected as relevant on the input, demystifying the otherwise black-box character of deep learning systems. The state of the art indicates that hidden node activations can, in some cases, be interpretable in a way that makes sense to humans, but systematic automated methods that would be able to hypothesize and verify interpretations of hidden neuron activations are underexplored. In this paper, we provide such a method and demonstrate that it provides meaningful interpretations. Our approach is based on using large-scale background knowledge approximately 2 million classes curated from the Wikipedia concept hierarchy together with a symbolic reasoning approach called Concept Induction based on description logics, originally developed for applications in the Semantic Web field. Our results show that we can automatically attach meaningful labels from the background knowledge to individual neurons in the dense layer of a Convolutional Neural Network through a hypothesis and verification process.

Related papers

Concept-Guided Interpretability via Neural Chunking [54.73787666584143]
We show that neural networks exhibit patterns in their raw population activity that mirror regularities in the training data.<n>We propose three methods to extract these emerging entities, complementing each other based on label availability and dimensionality.<n>Our work points to a new direction for interpretability, one that harnesses both cognitive principles and the structure of naturalistic data.
arXiv Detail & Related papers (2025-05-16T13:49:43Z)
From superposition to sparse codes: interpretable representations in neural networks [3.6738925004882685]
Recent evidence suggests that neural networks encode features in superposition, meaning that input concepts are linearly overlaid within the network's representations. We present a perspective that explains this phenomenon and provides a foundation for extracting interpretable representations from neural activations. Our arguments have implications for neural coding theories, AI transparency, and the broader goal of making deep learning models more interpretable.
arXiv Detail & Related papers (2025-03-03T18:49:59Z)
Discovering Chunks in Neural Embeddings for Interpretability [53.80157905839065]
We propose leveraging the principle of chunking to interpret artificial neural population activities. We first demonstrate this concept in recurrent neural networks (RNNs) trained on artificial sequences with imposed regularities. We identify similar recurring embedding states corresponding to concepts in the input, with perturbations to these states activating or inhibiting the associated concepts.
arXiv Detail & Related papers (2025-02-03T20:30:46Z)
On the Value of Labeled Data and Symbolic Methods for Hidden Neuron Activation Analysis [1.55858752644861]
State of the art indicates that hidden node activations can, in some cases, be interpretable in a way that makes sense to humans. We introduce a novel model-agnostic post-hoc Explainable AI method demonstrating that it provides meaningful interpretations.
arXiv Detail & Related papers (2024-04-21T07:57:45Z)
Manipulating Feature Visualizations with Gradient Slingshots [54.31109240020007]
We introduce a novel method for manipulating Feature Visualization (FV) without significantly impacting the model's decision-making process. We evaluate the effectiveness of our method on several neural network models and demonstrate its capabilities to hide the functionality of arbitrarily chosen neurons.
arXiv Detail & Related papers (2024-01-11T18:57:17Z)
Brain-Inspired Machine Intelligence: A Survey of Neurobiologically-Plausible Credit Assignment [65.268245109828]
We examine algorithms for conducting credit assignment in artificial neural networks that are inspired or motivated by neurobiology. We organize the ever-growing set of brain-inspired learning schemes into six general families and consider these in the context of backpropagation of errors. The results of this review are meant to encourage future developments in neuro-mimetic systems and their constituent learning processes.
arXiv Detail & Related papers (2023-12-01T05:20:57Z)
From Neural Activations to Concepts: A Survey on Explaining Concepts in Neural Networks [15.837316393474403]
Concepts can act as a natural link between learning and reasoning. Knowledge can not only be extracted from neural networks but concept knowledge can also be inserted into neural network architectures.
arXiv Detail & Related papers (2023-10-18T11:08:02Z)
Adversarial Attacks on the Interpretation of Neuron Activation Maximization [70.5472799454224]
Activation-maximization approaches are used to interpret and analyze trained deep-learning models. In this work, we consider the concept of an adversary manipulating a model for the purpose of deceiving the interpretation.
arXiv Detail & Related papers (2023-06-12T19:54:33Z)
Explaining Deep Learning Hidden Neuron Activations using Concept Induction [3.6223658572137825]
State of the art indicates that hidden node activations appear to be interpretable in a way that makes sense to humans. We show that we can automatically attach meaningful labels from the background knowledge to individual neurons in the dense layer of a Convolutional Neural Network.
arXiv Detail & Related papers (2023-01-23T18:14:32Z)
NeuroExplainer: Fine-Grained Attention Decoding to Uncover Cortical Development Patterns of Preterm Infants [73.85768093666582]
We propose an explainable geometric deep network dubbed NeuroExplainer. NeuroExplainer is used to uncover altered infant cortical development patterns associated with preterm birth.
arXiv Detail & Related papers (2023-01-01T12:48:12Z)
Mapping Knowledge Representations to Concepts: A Review and New Perspectives [0.6875312133832078]
This review focuses on research that aims to associate internal representations with human understandable concepts. We find this taxonomy and theories of causality, useful for understanding what can be expected, and not expected, from neural network explanations. The analysis additionally uncovers an ambiguity in the reviewed literature related to the goal of model explainability.
arXiv Detail & Related papers (2022-12-31T12:56:12Z)
An Interpretable Neuron Embedding for Static Knowledge Distillation [7.644253344815002]
We propose a new interpretable neural network method, by embedding neurons into the semantic space. The proposed semantic vector externalizes the latent knowledge to static knowledge, which is easy to exploit. Empirical experiments of visualization show that semantic vectors describe neuron activation semantics well.
arXiv Detail & Related papers (2022-11-14T03:26:10Z)
Searching for the Essence of Adversarial Perturbations [73.96215665913797]
We show that adversarial perturbations contain human-recognizable information, which is the key conspirator responsible for a neural network's erroneous prediction. This concept of human-recognizable information allows us to explain key features related to adversarial perturbations.
arXiv Detail & Related papers (2022-05-30T18:04:57Z)
Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules. inputs to the model are routed through a sequence of functions in a way that is end-to-end learned. We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.