Toward Transparent AI: A Survey on Interpreting the Inner Structures of
Deep Neural Networks
- URL: http://arxiv.org/abs/2207.13243v6
- Date: Fri, 18 Aug 2023 21:14:43 GMT
- Title: Toward Transparent AI: A Survey on Interpreting the Inner Structures of
Deep Neural Networks
- Authors: Tilman R\"auker, Anson Ho, Stephen Casper, Dylan Hadfield-Menell
- Abstract summary: We review over 300 works with a focus on inner interpretability tools.
We introduce a taxonomy that classifies methods by what part of the network they help to explain.
We argue that the status quo in interpretability research is largely unproductive.
- Score: 8.445831718854153
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The last decade of machine learning has seen drastic increases in scale and
capabilities. Deep neural networks (DNNs) are increasingly being deployed in
the real world. However, they are difficult to analyze, raising concerns about
using them without a rigorous understanding of how they function. Effective
tools for interpreting them will be important for building more trustworthy AI
by helping to identify problems, fix bugs, and improve basic understanding. In
particular, "inner" interpretability techniques, which focus on explaining the
internal components of DNNs, are well-suited for developing a mechanistic
understanding, guiding manual modifications, and reverse engineering solutions.
Much recent work has focused on DNN interpretability, and rapid progress has
thus far made a thorough systematization of methods difficult. In this survey,
we review over 300 works with a focus on inner interpretability tools. We
introduce a taxonomy that classifies methods by what part of the network they
help to explain (weights, neurons, subnetworks, or latent representations) and
whether they are implemented during (intrinsic) or after (post hoc) training.
To our knowledge, we are also the first to survey a number of connections
between interpretability research and work in adversarial robustness, continual
learning, modularity, network compression, and studying the human visual
system. We discuss key challenges and argue that the status quo in
interpretability research is largely unproductive. Finally, we highlight the
importance of future work that emphasizes diagnostics, debugging, adversaries,
and benchmarking in order to make interpretability tools more useful to
engineers in practical applications.
Related papers
- Data Science Principles for Interpretable and Explainable AI [0.7581664835990121]
Interpretable and interactive machine learning aims to make complex models more transparent and controllable.
This review synthesizes key principles from the growing literature in this field.
arXiv Detail & Related papers (2024-05-17T05:32:27Z) - How Do Transformers Learn Topic Structure: Towards a Mechanistic
Understanding [56.222097640468306]
We provide mechanistic understanding of how transformers learn "semantic structure"
We show, through a combination of mathematical analysis and experiments on Wikipedia data, that the embedding layer and the self-attention layer encode the topical structure.
arXiv Detail & Related papers (2023-03-07T21:42:17Z) - Interpreting Neural Policies with Disentangled Tree Representations [58.769048492254555]
We study interpretability of compact neural policies through the lens of disentangled representation.
We leverage decision trees to obtain factors of variation for disentanglement in robot learning.
We introduce interpretability metrics that measure disentanglement of learned neural dynamics.
arXiv Detail & Related papers (2022-10-13T01:10:41Z) - Towards Benchmarking Explainable Artificial Intelligence Methods [0.0]
We use philosophy of science theories as an analytical lens with the goal of revealing, what can be expected, and more importantly, not expected, from methods that aim to explain decisions promoted by a neural network.
By conducting a case study we investigate a selection of explainability method's performance over two mundane domains, animals and headgear.
We lay bare that the usefulness of these methods relies on human domain knowledge and our ability to understand, generalise and reason.
arXiv Detail & Related papers (2022-08-25T14:28:30Z) - Explainability via Interactivity? Supporting Nonexperts' Sensemaking of
Pretrained CNN by Interacting with Their Daily Surroundings [7.455054065013047]
We present a mobile application to support nonexperts to interactively make sense of Convolutional Neural Networks (CNN)
It allows users to play with a pretrained CNN by taking pictures of their surrounding objects.
We use an up-to-date XAI technique (Class Activation Map) to intuitively visualize the model's decision.
arXiv Detail & Related papers (2021-05-31T19:22:53Z) - Interpretable Deep Learning: Interpretations, Interpretability,
Trustworthiness, and Beyond [49.93153180169685]
We introduce and clarify two basic concepts-interpretations and interpretability-that people usually get confused.
We elaborate the design of several recent interpretation algorithms, from different perspectives, through proposing a new taxonomy.
We summarize the existing work in evaluating models' interpretability using "trustworthy" interpretation algorithms.
arXiv Detail & Related papers (2021-03-19T08:40:30Z) - A Survey on Understanding, Visualizations, and Explanation of Deep
Neural Networks [0.0]
It is of paramount importance to understand, trust, and in one word "explain" the argument behind deep models' decisions.
In many applications, artificial neural networks (including DNNs) are considered as black-box systems, which do not provide sufficient clue on their internal processing actions.
arXiv Detail & Related papers (2021-02-02T22:57:22Z) - i-Algebra: Towards Interactive Interpretability of Deep Neural Networks [41.13047686374529]
We present i-Algebra, a first-of-its-kind interactive framework for interpreting deep neural networks (DNNs)
At its core is a library of atomic, composable operators, which explain model behaviors at varying input granularity, during different inference stages, and from distinct interpretation perspectives.
We conduct user studies in a set of representative analysis tasks, including inspecting adversarial inputs, resolving model inconsistency, and cleansing contaminated data, all demonstrating its promising usability.
arXiv Detail & Related papers (2021-01-22T19:22:57Z) - Explainability in Deep Reinforcement Learning [68.8204255655161]
We review recent works in the direction to attain Explainable Reinforcement Learning (XRL)
In critical situations where it is essential to justify and explain the agent's behaviour, better explainability and interpretability of RL models could help gain scientific insight on the inner workings of what is still considered a black box.
arXiv Detail & Related papers (2020-08-15T10:11:42Z) - Boosting Deep Neural Networks with Geometrical Prior Knowledge: A Survey [77.99182201815763]
Deep Neural Networks (DNNs) achieve state-of-the-art results in many different problem settings.
DNNs are often treated as black box systems, which complicates their evaluation and validation.
One promising field, inspired by the success of convolutional neural networks (CNNs) in computer vision tasks, is to incorporate knowledge about symmetric geometrical transformations.
arXiv Detail & Related papers (2020-06-30T14:56:05Z) - Dark, Beyond Deep: A Paradigm Shift to Cognitive AI with Humanlike
Common Sense [142.53911271465344]
We argue that the next generation of AI must embrace "dark" humanlike common sense for solving novel tasks.
We identify functionality, physics, intent, causality, and utility (FPICU) as the five core domains of cognitive AI with humanlike common sense.
arXiv Detail & Related papers (2020-04-20T04:07:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.