A Survey on Neural Network Interpretability
- URL: http://arxiv.org/abs/2012.14261v2
- Date: Wed, 3 Mar 2021 09:00:09 GMT
- Title: A Survey on Neural Network Interpretability
- Authors: Yu Zhang, Peter Ti\v{n}o, Ale\v{s} Leonardis, Ke Tang
- Abstract summary: interpretability is a desired property for deep networks to become powerful tools in other research fields.
We propose a novel taxonomy organized along three dimensions: type of engagement (passive vs. active interpretation approaches), the type of explanation, and the focus.
- Score: 25.27545364222555
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Along with the great success of deep neural networks, there is also growing
concern about their black-box nature. The interpretability issue affects
people's trust on deep learning systems. It is also related to many ethical
problems, e.g., algorithmic discrimination. Moreover, interpretability is a
desired property for deep networks to become powerful tools in other research
fields, e.g., drug discovery and genomics. In this survey, we conduct a
comprehensive review of the neural network interpretability research. We first
clarify the definition of interpretability as it has been used in many
different contexts. Then we elaborate on the importance of interpretability and
propose a novel taxonomy organized along three dimensions: type of engagement
(passive vs. active interpretation approaches), the type of explanation, and
the focus (from local to global interpretability). This taxonomy provides a
meaningful 3D view of distribution of papers from the relevant literature as
two of the dimensions are not simply categorical but allow ordinal
subcategories. Finally, we summarize the existing interpretability evaluation
methods and suggest possible research directions inspired by our new taxonomy.
Related papers
- Explaining Deep Neural Networks by Leveraging Intrinsic Methods [0.9790236766474201]
This thesis contributes to the field of eXplainable AI, focusing on enhancing the interpretability of deep neural networks.
The core contributions lie in introducing novel techniques aimed at making these networks more interpretable by leveraging an analysis of their inner workings.
Secondly, this research delves into novel investigations on neurons within trained deep neural networks, shedding light on overlooked phenomena related to their activation values.
arXiv Detail & Related papers (2024-07-17T01:20:17Z) - Mapping Knowledge Representations to Concepts: A Review and New
Perspectives [0.6875312133832078]
This review focuses on research that aims to associate internal representations with human understandable concepts.
We find this taxonomy and theories of causality, useful for understanding what can be expected, and not expected, from neural network explanations.
The analysis additionally uncovers an ambiguity in the reviewed literature related to the goal of model explainability.
arXiv Detail & Related papers (2022-12-31T12:56:12Z) - TeKo: Text-Rich Graph Neural Networks with External Knowledge [75.91477450060808]
We propose a novel text-rich graph neural network with external knowledge (TeKo)
We first present a flexible heterogeneous semantic network that incorporates high-quality entities.
We then introduce two types of external knowledge, that is, structured triplets and unstructured entity description.
arXiv Detail & Related papers (2022-06-15T02:33:10Z) - Discriminative Attribution from Counterfactuals [64.94009515033984]
We present a method for neural network interpretability by combining feature attribution with counterfactual explanations.
We show that this method can be used to quantitatively evaluate the performance of feature attribution methods in an objective manner.
arXiv Detail & Related papers (2021-09-28T00:53:34Z) - A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation.
Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z) - Local Interpretations for Explainable Natural Language Processing: A Survey [5.717407321642629]
This work investigates various methods to improve the interpretability of deep neural networks for Natural Language Processing (NLP) tasks.
We provide a comprehensive discussion on the definition of the term interpretability and its various aspects at the beginning of this work.
arXiv Detail & Related papers (2021-03-20T02:28:33Z) - Interpretable Deep Learning: Interpretations, Interpretability,
Trustworthiness, and Beyond [49.93153180169685]
We introduce and clarify two basic concepts-interpretations and interpretability-that people usually get confused.
We elaborate the design of several recent interpretation algorithms, from different perspectives, through proposing a new taxonomy.
We summarize the existing work in evaluating models' interpretability using "trustworthy" interpretation algorithms.
arXiv Detail & Related papers (2021-03-19T08:40:30Z) - Interpretability and Explainability: A Machine Learning Zoo Mini-tour [4.56877715768796]
Interpretability and explainability lie at the core of many machine learning and statistical applications in medicine, economics, law, and natural sciences.
We emphasise the divide between interpretability and explainability and illustrate these two different research directions with concrete examples of the state-of-the-art.
arXiv Detail & Related papers (2020-12-03T10:11:52Z) - Understanding the wiring evolution in differentiable neural architecture
search [114.31723873105082]
Controversy exists on whether differentiable neural architecture search methods discover wiring topology effectively.
We study the underlying mechanism of several existing differentiable NAS frameworks.
arXiv Detail & Related papers (2020-09-02T18:08:34Z) - A Chain Graph Interpretation of Real-World Neural Networks [58.78692706974121]
We propose an alternative interpretation that identifies NNs as chain graphs (CGs) and feed-forward as an approximate inference procedure.
The CG interpretation specifies the nature of each NN component within the rich theoretical framework of probabilistic graphical models.
We demonstrate with concrete examples that the CG interpretation can provide novel theoretical support and insights for various NN techniques.
arXiv Detail & Related papers (2020-06-30T14:46:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.