Related papers: Interpretability and Explainability: A Machine Learning Zoo Mini-tour

Interpretability and Explainability: A Machine Learning Zoo Mini-tour

URL: http://arxiv.org/abs/2012.01805v1
Date: Thu, 3 Dec 2020 10:11:52 GMT
Title: Interpretability and Explainability: A Machine Learning Zoo Mini-tour
Authors: Ri\v{c}ards Marcinkevi\v{c}s and Julia E. Vogt
Abstract summary: Interpretability and explainability lie at the core of many machine learning and statistical applications in medicine, economics, law, and natural sciences. We emphasise the divide between interpretability and explainability and illustrate these two different research directions with concrete examples of the state-of-the-art.
Score: 4.56877715768796
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this review, we examine the problem of designing interpretable and explainable machine learning models. Interpretability and explainability lie at the core of many machine learning and statistical applications in medicine, economics, law, and natural sciences. Although interpretability and explainability have escaped a clear universal definition, many techniques motivated by these properties have been developed over the recent 30 years with the focus currently shifting towards deep learning methods. In this review, we emphasise the divide between interpretability and explainability and illustrate these two different research directions with concrete examples of the state-of-the-art. The review is intended for a general machine learning audience with interest in exploring the problems of interpretation and explanation beyond logistic regression or random forest variable importance. This work is not an exhaustive literature survey, but rather a primer focusing selectively on certain lines of research which the authors found interesting or informative.

Related papers

Ontology Embedding: A Survey of Methods, Applications and Resources [54.3453925775069]
Ontologies are widely used for representing domain knowledge and meta data. One straightforward solution is to integrate statistical analysis and machine learning. Numerous papers have been published on embedding, but a lack of systematic reviews hinders researchers from gaining a comprehensive understanding of this field.
arXiv Detail & Related papers (2024-06-16T14:49:19Z)
On the Relationship Between Interpretability and Explainability in Machine Learning [2.828173677501078]
Interpretability and explainability have gained more and more attention in the field of machine learning. Since both provide information about predictors and their decision process, they are often seen as two independent means for one single end. This view has led to a dichotomous literature: explainability techniques designed for complex black-box models, or interpretable approaches ignoring the many explainability tools.
arXiv Detail & Related papers (2023-11-20T02:31:08Z)
How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding [56.222097640468306]
We provide mechanistic understanding of how transformers learn "semantic structure" We show, through a combination of mathematical analysis and experiments on Wikipedia data, that the embedding layer and the self-attention layer encode the topical structure.
arXiv Detail & Related papers (2023-03-07T21:42:17Z)
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small [68.879023473838]
We present an explanation for how GPT-2 small performs a natural language task called indirect object identification (IOI) To our knowledge, this investigation is the largest end-to-end attempt at reverse-engineering a natural behavior "in the wild" in a language model.
arXiv Detail & Related papers (2022-11-01T17:08:44Z)
Human Interpretation of Saliency-based Explanation Over Text [65.29015910991261]
We study saliency-based explanations over textual data. We find that people often mis-interpret the explanations. We propose a method to adjust saliencies based on model estimates of over- and under-perception.
arXiv Detail & Related papers (2022-01-27T15:20:32Z)
Local Interpretations for Explainable Natural Language Processing: A Survey [5.717407321642629]
This work investigates various methods to improve the interpretability of deep neural networks for Natural Language Processing (NLP) tasks. We provide a comprehensive discussion on the definition of the term interpretability and its various aspects at the beginning of this work.
arXiv Detail & Related papers (2021-03-20T02:28:33Z)
Interpretable Deep Learning: Interpretations, Interpretability, Trustworthiness, and Beyond [49.93153180169685]
We introduce and clarify two basic concepts-interpretations and interpretability-that people usually get confused. We elaborate the design of several recent interpretation algorithms, from different perspectives, through proposing a new taxonomy. We summarize the existing work in evaluating models' interpretability using "trustworthy" interpretation algorithms.
arXiv Detail & Related papers (2021-03-19T08:40:30Z)
A Survey on Neural Network Interpretability [25.27545364222555]
interpretability is a desired property for deep networks to become powerful tools in other research fields. We propose a novel taxonomy organized along three dimensions: type of engagement (passive vs. active interpretation approaches), the type of explanation, and the focus.
arXiv Detail & Related papers (2020-12-28T15:09:50Z)
This is not the Texture you are looking for! Introducing Novel Counterfactual Explanations for Non-Experts using Generative Adversarial Learning [59.17685450892182]
counterfactual explanation systems try to enable a counterfactual reasoning by modifying the input image. We present a novel approach to generate such counterfactual image explanations based on adversarial image-to-image translation techniques. Our results show that our approach leads to significantly better results regarding mental models, explanation satisfaction, trust, emotions, and self-efficacy than two state-of-the art systems.
arXiv Detail & Related papers (2020-12-22T10:08:05Z)
Knowledge as Invariance -- History and Perspectives of Knowledge-augmented Machine Learning [69.99522650448213]
Research in machine learning is at a turning point. Research interests are shifting away from increasing the performance of highly parameterized models to exceedingly specific tasks. This white paper provides an introduction and discussion of this emerging field in machine learning research.
arXiv Detail & Related papers (2020-12-21T15:07:19Z)
Counterfactual Explanations for Machine Learning: A Review [5.908471365011942]
We review and categorize research on counterfactual explanations in machine learning. Modern approaches to counterfactual explainability in machine learning draw connections to the established legal doctrine in many countries.
arXiv Detail & Related papers (2020-10-20T20:08:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.