NeuroInspect: Interpretable Neuron-based Debugging Framework through
Class-conditional Visualizations
- URL: http://arxiv.org/abs/2310.07184v2
- Date: Tue, 17 Oct 2023 09:00:22 GMT
- Title: NeuroInspect: Interpretable Neuron-based Debugging Framework through
Class-conditional Visualizations
- Authors: Yeong-Joon Ju, Ji-Hoon Park, and Seong-Whan Lee
- Abstract summary: We present NeuroInspect, an interpretable neuron-based debug framework for deep learning (DL) models.
Our framework first pinpoints neurons responsible for mistakes in the network and then visualizes features embedded in the neurons to be human-interpretable.
We validate our framework by addressing false correlations and improving inferences for classes with the worst performance in real-world settings.
- Score: 28.552283701883766
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite deep learning (DL) has achieved remarkable progress in various
domains, the DL models are still prone to making mistakes. This issue
necessitates effective debugging tools for DL practitioners to interpret the
decision-making process within the networks. However, existing debugging
methods often demand extra data or adjustments to the decision process,
limiting their applicability. To tackle this problem, we present NeuroInspect,
an interpretable neuron-based debugging framework with three key stages:
counterfactual explanations, feature visualizations, and false correlation
mitigation. Our debugging framework first pinpoints neurons responsible for
mistakes in the network and then visualizes features embedded in the neurons to
be human-interpretable. To provide these explanations, we introduce
CLIP-Illusion, a novel feature visualization method that generates images
representing features conditioned on classes to examine the connection between
neurons and the decision layer. We alleviate convoluted explanations of the
conventional visualization approach by employing class information, thereby
isolating mixed properties. This process offers more human-interpretable
explanations for model errors without altering the trained network or requiring
additional data. Furthermore, our framework mitigates false correlations
learned from a dataset under a stochastic perspective, modifying decisions for
the neurons considered as the main causes. We validate the effectiveness of our
framework by addressing false correlations and improving inferences for classes
with the worst performance in real-world settings. Moreover, we demonstrate
that NeuroInspect helps debug the mistakes of DL models through evaluation for
human understanding. The code is openly available at
https://github.com/yeongjoonJu/NeuroInspect.
Related papers
- Manipulating Feature Visualizations with Gradient Slingshots [54.31109240020007]
We introduce a novel method for manipulating Feature Visualization (FV) without significantly impacting the model's decision-making process.
We evaluate the effectiveness of our method on several neural network models and demonstrate its capabilities to hide the functionality of arbitrarily chosen neurons.
arXiv Detail & Related papers (2024-01-11T18:57:17Z) - Causal Analysis for Robust Interpretability of Neural Networks [0.2519906683279152]
We develop a robust interventional-based method to capture cause-effect mechanisms in pre-trained neural networks.
We apply our method to vision models trained on classification tasks.
arXiv Detail & Related papers (2023-05-15T18:37:24Z) - Feature visualization for convolutional neural network models trained on
neuroimaging data [0.0]
We show for the first time results using feature visualization of convolutional neural networks (CNNs)
We have trained CNNs for different tasks including sex classification and artificial lesion classification based on structural magnetic resonance imaging (MRI) data.
The resulting images reveal the learned concepts of the artificial lesions, including their shapes, but remain hard to interpret for abstract features in the sex classification task.
arXiv Detail & Related papers (2022-03-24T15:24:38Z) - LAP: An Attention-Based Module for Concept Based Self-Interpretation and
Knowledge Injection in Convolutional Neural Networks [2.8948274245812327]
We propose a new attention-based pooling layer, called Local Attention Pooling (LAP), that accomplishes self-interpretability.
LAP is easily pluggable into any convolutional neural network, even the already trained ones.
LAP offers more valid human-understandable and faithful-to-the-model interpretations than the commonly used white-box explainer methods.
arXiv Detail & Related papers (2022-01-27T21:10:20Z) - Training Feedback Spiking Neural Networks by Implicit Differentiation on
the Equilibrium State [66.2457134675891]
Spiking neural networks (SNNs) are brain-inspired models that enable energy-efficient implementation on neuromorphic hardware.
Most existing methods imitate the backpropagation framework and feedforward architectures for artificial neural networks.
We propose a novel training method that does not rely on the exact reverse of the forward computation.
arXiv Detail & Related papers (2021-09-29T07:46:54Z) - Explainable Adversarial Attacks in Deep Neural Networks Using Activation
Profiles [69.9674326582747]
This paper presents a visual framework to investigate neural network models subjected to adversarial examples.
We show how observing these elements can quickly pinpoint exploited areas in a model.
arXiv Detail & Related papers (2021-03-18T13:04:21Z) - Proactive Pseudo-Intervention: Causally Informed Contrastive Learning
For Interpretable Vision Models [103.64435911083432]
We present a novel contrastive learning strategy called it Proactive Pseudo-Intervention (PPI)
PPI leverages proactive interventions to guard against image features with no causal relevance.
We also devise a novel causally informed salience mapping module to identify key image pixels to intervene, and show it greatly facilitates model interpretability.
arXiv Detail & Related papers (2020-12-06T20:30:26Z) - Explain by Evidence: An Explainable Memory-based Neural Network for
Question Answering [41.73026155036886]
This paper proposes an explainable, evidence-based memory network architecture.
It learns to summarize the dataset and extract supporting evidences to make its decision.
Our model achieves state-of-the-art performance on two popular question answering datasets.
arXiv Detail & Related papers (2020-11-05T21:18:21Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z) - Hold me tight! Influence of discriminative features on deep network
boundaries [63.627760598441796]
We propose a new perspective that relates dataset features to the distance of samples to the decision boundary.
This enables us to carefully tweak the position of the training samples and measure the induced changes on the boundaries of CNNs trained on large-scale vision datasets.
arXiv Detail & Related papers (2020-02-15T09:29:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.