Interpretability and Explainability: A Machine Learning Zoo Mini-tour
- URL: http://arxiv.org/abs/2012.01805v1
- Date: Thu, 3 Dec 2020 10:11:52 GMT
- Title: Interpretability and Explainability: A Machine Learning Zoo Mini-tour
- Authors: Ri\v{c}ards Marcinkevi\v{c}s and Julia E. Vogt
- Abstract summary: Interpretability and explainability lie at the core of many machine learning and statistical applications in medicine, economics, law, and natural sciences.
We emphasise the divide between interpretability and explainability and illustrate these two different research directions with concrete examples of the state-of-the-art.
- Score: 4.56877715768796
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this review, we examine the problem of designing interpretable and
explainable machine learning models. Interpretability and explainability lie at
the core of many machine learning and statistical applications in medicine,
economics, law, and natural sciences. Although interpretability and
explainability have escaped a clear universal definition, many techniques
motivated by these properties have been developed over the recent 30 years with
the focus currently shifting towards deep learning methods. In this review, we
emphasise the divide between interpretability and explainability and illustrate
these two different research directions with concrete examples of the
state-of-the-art. The review is intended for a general machine learning
audience with interest in exploring the problems of interpretation and
explanation beyond logistic regression or random forest variable importance.
This work is not an exhaustive literature survey, but rather a primer focusing
selectively on certain lines of research which the authors found interesting or
informative.
Related papers
- Ontology Embedding: A Survey of Methods, Applications and Resources [54.3453925775069]
Ontologies are widely used for representing domain knowledge and meta data.
One straightforward solution is to integrate statistical analysis and machine learning.
Numerous papers have been published on embedding, but a lack of systematic reviews hinders researchers from gaining a comprehensive understanding of this field.
arXiv Detail & Related papers (2024-06-16T14:49:19Z) - On the Relationship Between Interpretability and Explainability in Machine Learning [2.828173677501078]
Interpretability and explainability have gained more and more attention in the field of machine learning.
Since both provide information about predictors and their decision process, they are often seen as two independent means for one single end.
This view has led to a dichotomous literature: explainability techniques designed for complex black-box models, or interpretable approaches ignoring the many explainability tools.
arXiv Detail & Related papers (2023-11-20T02:31:08Z) - How Do Transformers Learn Topic Structure: Towards a Mechanistic
Understanding [56.222097640468306]
We provide mechanistic understanding of how transformers learn "semantic structure"
We show, through a combination of mathematical analysis and experiments on Wikipedia data, that the embedding layer and the self-attention layer encode the topical structure.
arXiv Detail & Related papers (2023-03-07T21:42:17Z) - Interpretability in the Wild: a Circuit for Indirect Object
Identification in GPT-2 small [68.879023473838]
We present an explanation for how GPT-2 small performs a natural language task called indirect object identification (IOI)
To our knowledge, this investigation is the largest end-to-end attempt at reverse-engineering a natural behavior "in the wild" in a language model.
arXiv Detail & Related papers (2022-11-01T17:08:44Z) - Human Interpretation of Saliency-based Explanation Over Text [65.29015910991261]
We study saliency-based explanations over textual data.
We find that people often mis-interpret the explanations.
We propose a method to adjust saliencies based on model estimates of over- and under-perception.
arXiv Detail & Related papers (2022-01-27T15:20:32Z) - Local Interpretations for Explainable Natural Language Processing: A Survey [5.717407321642629]
This work investigates various methods to improve the interpretability of deep neural networks for Natural Language Processing (NLP) tasks.
We provide a comprehensive discussion on the definition of the term interpretability and its various aspects at the beginning of this work.
arXiv Detail & Related papers (2021-03-20T02:28:33Z) - Interpretable Deep Learning: Interpretations, Interpretability,
Trustworthiness, and Beyond [49.93153180169685]
We introduce and clarify two basic concepts-interpretations and interpretability-that people usually get confused.
We elaborate the design of several recent interpretation algorithms, from different perspectives, through proposing a new taxonomy.
We summarize the existing work in evaluating models' interpretability using "trustworthy" interpretation algorithms.
arXiv Detail & Related papers (2021-03-19T08:40:30Z) - A Survey on Neural Network Interpretability [25.27545364222555]
interpretability is a desired property for deep networks to become powerful tools in other research fields.
We propose a novel taxonomy organized along three dimensions: type of engagement (passive vs. active interpretation approaches), the type of explanation, and the focus.
arXiv Detail & Related papers (2020-12-28T15:09:50Z) - This is not the Texture you are looking for! Introducing Novel
Counterfactual Explanations for Non-Experts using Generative Adversarial
Learning [59.17685450892182]
counterfactual explanation systems try to enable a counterfactual reasoning by modifying the input image.
We present a novel approach to generate such counterfactual image explanations based on adversarial image-to-image translation techniques.
Our results show that our approach leads to significantly better results regarding mental models, explanation satisfaction, trust, emotions, and self-efficacy than two state-of-the art systems.
arXiv Detail & Related papers (2020-12-22T10:08:05Z) - Counterfactual Explanations for Machine Learning: A Review [5.908471365011942]
We review and categorize research on counterfactual explanations in machine learning.
Modern approaches to counterfactual explainability in machine learning draw connections to the established legal doctrine in many countries.
arXiv Detail & Related papers (2020-10-20T20:08:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.