Making Neural Networks Interpretable with Attribution: Application to
Implicit Signals Prediction
- URL: http://arxiv.org/abs/2008.11406v1
- Date: Wed, 26 Aug 2020 06:46:49 GMT
- Title: Making Neural Networks Interpretable with Attribution: Application to
Implicit Signals Prediction
- Authors: Darius Afchar and Romain Hennequin
- Abstract summary: We propose a novel formulation of interpretable deep neural networks for the attribution task.
Using masked weights, hidden features can be deeply attributed, split into several input-restricted sub-networks and trained as a boosted mixture of experts.
- Score: 11.427019313283997
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Explaining recommendations enables users to understand whether recommended
items are relevant to their needs and has been shown to increase their trust in
the system. More generally, if designing explainable machine learning models is
key to check the sanity and robustness of a decision process and improve their
efficiency, it however remains a challenge for complex architectures,
especially deep neural networks that are often deemed "black-box". In this
paper, we propose a novel formulation of interpretable deep neural networks for
the attribution task. Differently to popular post-hoc methods, our approach is
interpretable by design. Using masked weights, hidden features can be deeply
attributed, split into several input-restricted sub-networks and trained as a
boosted mixture of experts. Experimental results on synthetic data and
real-world recommendation tasks demonstrate that our method enables to build
models achieving close predictive performances to their non-interpretable
counterparts, while providing informative attribution interpretations.
Related papers
- Adversarial Attacks on the Interpretation of Neuron Activation
Maximization [70.5472799454224]
Activation-maximization approaches are used to interpret and analyze trained deep-learning models.
In this work, we consider the concept of an adversary manipulating a model for the purpose of deceiving the interpretation.
arXiv Detail & Related papers (2023-06-12T19:54:33Z) - Breaking the Paradox of Explainable Deep Learning [13.320917259299652]
We propose a novel method that trains deep hypernetworks to generate explainable linear models.
Our models retain the accuracy of black-box deep networks while offering free lunch explainability by design.
arXiv Detail & Related papers (2023-05-22T14:41:17Z) - Deep networks for system identification: a Survey [56.34005280792013]
System identification learns mathematical descriptions of dynamic systems from input-output data.
Main aim of the identified model is to predict new data from previous observations.
We discuss architectures commonly adopted in the literature, like feedforward, convolutional, and recurrent networks.
arXiv Detail & Related papers (2023-01-30T12:38:31Z) - LAP: An Attention-Based Module for Concept Based Self-Interpretation and
Knowledge Injection in Convolutional Neural Networks [2.8948274245812327]
We propose a new attention-based pooling layer, called Local Attention Pooling (LAP), that accomplishes self-interpretability.
LAP is easily pluggable into any convolutional neural network, even the already trained ones.
LAP offers more valid human-understandable and faithful-to-the-model interpretations than the commonly used white-box explainer methods.
arXiv Detail & Related papers (2022-01-27T21:10:20Z) - Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules.
inputs to the model are routed through a sequence of functions in a way that is end-to-end learned.
We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z) - Towards Interpretable Deep Networks for Monocular Depth Estimation [78.84690613778739]
We quantify the interpretability of a deep MDE network by the depth selectivity of its hidden units.
We propose a method to train interpretable MDE deep networks without changing their original architectures.
Experimental results demonstrate that our method is able to enhance the interpretability of deep MDE networks.
arXiv Detail & Related papers (2021-08-11T16:43:45Z) - Explainability-aided Domain Generalization for Image Classification [0.0]
We show that applying methods and architectures from the explainability literature can achieve state-of-the-art performance for the challenging task of domain generalization.
We develop a set of novel algorithms including DivCAM, an approach where the network receives guidance during training via gradient based class activation maps to focus on a diverse set of discriminative features.
Since these methods offer competitive performance on top of explainability, we argue that the proposed methods can be used as a tool to improve the robustness of deep neural network architectures.
arXiv Detail & Related papers (2021-04-05T02:27:01Z) - Interpretable Deep Learning: Interpretations, Interpretability,
Trustworthiness, and Beyond [49.93153180169685]
We introduce and clarify two basic concepts-interpretations and interpretability-that people usually get confused.
We elaborate the design of several recent interpretation algorithms, from different perspectives, through proposing a new taxonomy.
We summarize the existing work in evaluating models' interpretability using "trustworthy" interpretation algorithms.
arXiv Detail & Related papers (2021-03-19T08:40:30Z) - How Much Can I Trust You? -- Quantifying Uncertainties in Explaining
Neural Networks [19.648814035399013]
Explainable AI (XAI) aims to provide interpretations for predictions made by learning machines, such as deep neural networks.
We propose a new framework that allows to convert any arbitrary explanation method for neural networks into an explanation method for Bayesian neural networks.
We demonstrate the effectiveness and usefulness of our approach extensively in various experiments.
arXiv Detail & Related papers (2020-06-16T08:54:42Z) - Plausible Counterfactuals: Auditing Deep Learning Classifiers with
Realistic Adversarial Examples [84.8370546614042]
Black-box nature of Deep Learning models has posed unanswered questions about what they learn from data.
Generative Adversarial Network (GAN) and multi-objectives are used to furnish a plausible attack to the audited model.
Its utility is showcased within a human face classification task, unveiling the enormous potential of the proposed framework.
arXiv Detail & Related papers (2020-03-25T11:08:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.