i-Algebra: Towards Interactive Interpretability of Deep Neural Networks
- URL: http://arxiv.org/abs/2101.09301v1
- Date: Fri, 22 Jan 2021 19:22:57 GMT
- Title: i-Algebra: Towards Interactive Interpretability of Deep Neural Networks
- Authors: Xinyang Zhang, Ren Pang, Shouling Ji, Fenglong Ma, Ting Wang
- Abstract summary: We present i-Algebra, a first-of-its-kind interactive framework for interpreting deep neural networks (DNNs)
At its core is a library of atomic, composable operators, which explain model behaviors at varying input granularity, during different inference stages, and from distinct interpretation perspectives.
We conduct user studies in a set of representative analysis tasks, including inspecting adversarial inputs, resolving model inconsistency, and cleansing contaminated data, all demonstrating its promising usability.
- Score: 41.13047686374529
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Providing explanations for deep neural networks (DNNs) is essential for their
use in domains wherein the interpretability of decisions is a critical
prerequisite. Despite the plethora of work on interpreting DNNs, most existing
solutions offer interpretability in an ad hoc, one-shot, and static manner,
without accounting for the perception, understanding, or response of end-users,
resulting in their poor usability in practice. In this paper, we argue that DNN
interpretability should be implemented as the interactions between users and
models. We present i-Algebra, a first-of-its-kind interactive framework for
interpreting DNNs. At its core is a library of atomic, composable operators,
which explain model behaviors at varying input granularity, during different
inference stages, and from distinct interpretation perspectives. Leveraging a
declarative query language, users are enabled to build various analysis tools
(e.g., "drill-down", "comparative", "what-if" analysis) via flexibly composing
such operators. We prototype i-Algebra and conduct user studies in a set of
representative analysis tasks, including inspecting adversarial inputs,
resolving model inconsistency, and cleansing contaminated data, all
demonstrating its promising usability.
Related papers
- Neural Networks Decoded: Targeted and Robust Analysis of Neural Network Decisions via Causal Explanations and Reasoning [9.947555560412397]
We introduce TRACER, a novel method grounded in causal inference theory to estimate the causal dynamics underpinning DNN decisions.
Our approach systematically intervenes on input features to observe how specific changes propagate through the network, affecting internal activations and final outputs.
TRACER further enhances explainability by generating counterfactuals that reveal possible model biases and offer contrastive explanations for misclassifications.
arXiv Detail & Related papers (2024-10-07T20:44:53Z) - InterpretCC: Intrinsic User-Centric Interpretability through Global Mixture of Experts [31.738009841932374]
Interpretability for neural networks is a trade-off between three key requirements.
We present InterpretCC, a family of interpretable-by-design neural networks that guarantee human-centric interpretability.
arXiv Detail & Related papers (2024-02-05T11:55:50Z) - Adversarial Attacks on the Interpretation of Neuron Activation
Maximization [70.5472799454224]
Activation-maximization approaches are used to interpret and analyze trained deep-learning models.
In this work, we consider the concept of an adversary manipulating a model for the purpose of deceiving the interpretation.
arXiv Detail & Related papers (2023-06-12T19:54:33Z) - Hybrid CNN -Interpreter: Interpret local and global contexts for
CNN-based Models [9.148791330175191]
Convolutional neural network (CNN) models have seen advanced improvements in performance in various domains.
Lack of interpretability is a major barrier to assurance and regulation during operation for acceptance and deployment of AI-assisted applications.
We propose a novel hybrid CNN-interpreter through:.
An original forward propagation mechanism to examine the layer-specific prediction results for local interpretability.
A new global interpretability that indicates the feature correlation and filter importance effects.
arXiv Detail & Related papers (2022-10-31T22:59:33Z) - Self-Ensembling GAN for Cross-Domain Semantic Segmentation [107.27377745720243]
This paper proposes a self-ensembling generative adversarial network (SE-GAN) exploiting cross-domain data for semantic segmentation.
In SE-GAN, a teacher network and a student network constitute a self-ensembling model for generating semantic segmentation maps, which together with a discriminator, forms a GAN.
Despite its simplicity, we find SE-GAN can significantly boost the performance of adversarial training and enhance the stability of the model.
arXiv Detail & Related papers (2021-12-15T09:50:25Z) - Combining Discrete Choice Models and Neural Networks through Embeddings:
Formulation, Interpretability and Performance [10.57079240576682]
This study proposes a novel approach that combines theory and data-driven choice models using Artificial Neural Networks (ANNs)
In particular, we use continuous vector representations, called embeddings, for encoding categorical or discrete explanatory variables.
Our models deliver state-of-the-art predictive performance, outperforming existing ANN-based models while drastically reducing the number of required network parameters.
arXiv Detail & Related papers (2021-09-24T15:55:31Z) - Interpretable Deep Learning: Interpretations, Interpretability,
Trustworthiness, and Beyond [49.93153180169685]
We introduce and clarify two basic concepts-interpretations and interpretability-that people usually get confused.
We elaborate the design of several recent interpretation algorithms, from different perspectives, through proposing a new taxonomy.
We summarize the existing work in evaluating models' interpretability using "trustworthy" interpretation algorithms.
arXiv Detail & Related papers (2021-03-19T08:40:30Z) - Explaining and Improving Model Behavior with k Nearest Neighbor
Representations [107.24850861390196]
We propose using k nearest neighbor representations to identify training examples responsible for a model's predictions.
We show that kNN representations are effective at uncovering learned spurious associations.
Our results indicate that the kNN approach makes the finetuned model more robust to adversarial inputs.
arXiv Detail & Related papers (2020-10-18T16:55:25Z) - Interpreting Graph Neural Networks for NLP With Differentiable Edge
Masking [63.49779304362376]
Graph neural networks (GNNs) have become a popular approach to integrating structural inductive biases into NLP models.
We introduce a post-hoc method for interpreting the predictions of GNNs which identifies unnecessary edges.
We show that we can drop a large proportion of edges without deteriorating the performance of the model.
arXiv Detail & Related papers (2020-10-01T17:51:19Z) - An Adversarial Approach for Explaining the Predictions of Deep Neural
Networks [9.645196221785694]
We present a novel algorithm for explaining the predictions of a deep neural network (DNN) using adversarial machine learning.
Our approach identifies the relative importance of input features in relation to the predictions based on the behavior of an adversarial attack on the DNN.
Our analysis enables us to produce consistent and efficient explanations.
arXiv Detail & Related papers (2020-05-20T18:06:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.