Related papers: Making Neural Networks Interpretable with Attribution: Application to Implicit Signals Prediction

Making Neural Networks Interpretable with Attribution: Application to Implicit Signals Prediction

URL: http://arxiv.org/abs/2008.11406v1
Date: Wed, 26 Aug 2020 06:46:49 GMT
Title: Making Neural Networks Interpretable with Attribution: Application to Implicit Signals Prediction
Authors: Darius Afchar and Romain Hennequin
Abstract summary: We propose a novel formulation of interpretable deep neural networks for the attribution task. Using masked weights, hidden features can be deeply attributed, split into several input-restricted sub-networks and trained as a boosted mixture of experts.
Score: 11.427019313283997
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Explaining recommendations enables users to understand whether recommended items are relevant to their needs and has been shown to increase their trust in the system. More generally, if designing explainable machine learning models is key to check the sanity and robustness of a decision process and improve their efficiency, it however remains a challenge for complex architectures, especially deep neural networks that are often deemed "black-box". In this paper, we propose a novel formulation of interpretable deep neural networks for the attribution task. Differently to popular post-hoc methods, our approach is interpretable by design. Using masked weights, hidden features can be deeply attributed, split into several input-restricted sub-networks and trained as a boosted mixture of experts. Experimental results on synthetic data and real-world recommendation tasks demonstrate that our method enables to build models achieving close predictive performances to their non-interpretable counterparts, while providing informative attribution interpretations.

Related papers

Concept-Guided Interpretability via Neural Chunking [54.73787666584143]
We show that neural networks exhibit patterns in their raw population activity that mirror regularities in the training data.<n>We propose three methods to extract these emerging entities, complementing each other based on label availability and dimensionality.<n>Our work points to a new direction for interpretability, one that harnesses both cognitive principles and the structure of naturalistic data.
arXiv Detail & Related papers (2025-05-16T13:49:43Z)
Investigating the Duality of Interpretability and Explainability in Machine Learning [2.8311451575532156]
Complex "black box" models exhibit exceptional predictive performance. Their inherently opaque nature raises concerns about transparency and interpretability. Efforts are focused on explaining these models instead of developing ones that are inherently interpretable.
arXiv Detail & Related papers (2025-03-27T10:48:40Z)
Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond [61.18736646013446]
In pursuit of a deeper understanding of its surprising behaviors, we investigate the utility of a simple yet accurate model of a trained neural network. Across three case studies, we illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena.
arXiv Detail & Related papers (2024-10-31T22:54:34Z)
Perturbation on Feature Coalition: Towards Interpretable Deep Neural Networks [0.1398098625978622]
The "black box" nature of deep neural networks (DNNs) compromises their transparency and reliability. We introduce a perturbation-based interpretation guided by feature coalitions, which leverages deep information of network to extract correlated features.
arXiv Detail & Related papers (2024-08-23T22:44:21Z)
Improving Network Interpretability via Explanation Consistency Evaluation [56.14036428778861]
We propose a framework that acquires more explainable activation heatmaps and simultaneously increase the model performance. Specifically, our framework introduces a new metric, i.e., explanation consistency, to reweight the training samples adaptively in model learning. Our framework then promotes the model learning by paying closer attention to those training samples with a high difference in explanations.
arXiv Detail & Related papers (2024-08-08T17:20:08Z)
Adversarial Attacks on the Interpretation of Neuron Activation Maximization [70.5472799454224]
Activation-maximization approaches are used to interpret and analyze trained deep-learning models. In this work, we consider the concept of an adversary manipulating a model for the purpose of deceiving the interpretation.
arXiv Detail & Related papers (2023-06-12T19:54:33Z)
Deep networks for system identification: a Survey [56.34005280792013]
System identification learns mathematical descriptions of dynamic systems from input-output data. Main aim of the identified model is to predict new data from previous observations. We discuss architectures commonly adopted in the literature, like feedforward, convolutional, and recurrent networks.
arXiv Detail & Related papers (2023-01-30T12:38:31Z)
LAP: An Attention-Based Module for Concept Based Self-Interpretation and Knowledge Injection in Convolutional Neural Networks [2.8948274245812327]
We propose a new attention-based pooling layer, called Local Attention Pooling (LAP), that accomplishes self-interpretability. LAP is easily pluggable into any convolutional neural network, even the already trained ones. LAP offers more valid human-understandable and faithful-to-the-model interpretations than the commonly used white-box explainer methods.
arXiv Detail & Related papers (2022-01-27T21:10:20Z)
Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules. inputs to the model are routed through a sequence of functions in a way that is end-to-end learned. We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z)
Explainability-aided Domain Generalization for Image Classification [0.0]
We show that applying methods and architectures from the explainability literature can achieve state-of-the-art performance for the challenging task of domain generalization. We develop a set of novel algorithms including DivCAM, an approach where the network receives guidance during training via gradient based class activation maps to focus on a diverse set of discriminative features. Since these methods offer competitive performance on top of explainability, we argue that the proposed methods can be used as a tool to improve the robustness of deep neural network architectures.
arXiv Detail & Related papers (2021-04-05T02:27:01Z)
How Much Can I Trust You? -- Quantifying Uncertainties in Explaining Neural Networks [19.648814035399013]
Explainable AI (XAI) aims to provide interpretations for predictions made by learning machines, such as deep neural networks. We propose a new framework that allows to convert any arbitrary explanation method for neural networks into an explanation method for Bayesian neural networks. We demonstrate the effectiveness and usefulness of our approach extensively in various experiments.
arXiv Detail & Related papers (2020-06-16T08:54:42Z)
Plausible Counterfactuals: Auditing Deep Learning Classifiers with Realistic Adversarial Examples [84.8370546614042]
Black-box nature of Deep Learning models has posed unanswered questions about what they learn from data. Generative Adversarial Network (GAN) and multi-objectives are used to furnish a plausible attack to the audited model. Its utility is showcased within a human face classification task, unveiling the enormous potential of the proposed framework.
arXiv Detail & Related papers (2020-03-25T11:08:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.