A Framework to Learn with Interpretation
- URL: http://arxiv.org/abs/2010.09345v4
- Date: Wed, 23 Feb 2022 13:29:44 GMT
- Title: A Framework to Learn with Interpretation
- Authors: Jayneel Parekh, Pavlo Mozharovskyi, Florence d'Alch\'e-Buc
- Abstract summary: We present a novel framework to jointly learn a predictive model and its associated interpretation model.
We seek for a small-size dictionary of high level attribute functions that take as inputs the outputs of selected hidden layers.
A detailed pipeline to visualize the learnt features is also developed.
- Score: 2.3741312212138896
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To tackle interpretability in deep learning, we present a novel framework to
jointly learn a predictive model and its associated interpretation model. The
interpreter provides both local and global interpretability about the
predictive model in terms of human-understandable high level attribute
functions, with minimal loss of accuracy. This is achieved by a dedicated
architecture and well chosen regularization penalties. We seek for a small-size
dictionary of high level attribute functions that take as inputs the outputs of
selected hidden layers and whose outputs feed a linear classifier. We impose
strong conciseness on the activation of attributes with an entropy-based
criterion while enforcing fidelity to both inputs and outputs of the predictive
model. A detailed pipeline to visualize the learnt features is also developed.
Moreover, besides generating interpretable models by design, our approach can
be specialized to provide post-hoc interpretations for a pre-trained neural
network. We validate our approach against several state-of-the-art methods on
multiple datasets and show its efficacy on both kinds of tasks.
Related papers
- Prospector Heads: Generalized Feature Attribution for Large Models & Data [82.02696069543454]
We introduce prospector heads, an efficient and interpretable alternative to explanation-based attribution methods.
We demonstrate how prospector heads enable improved interpretation and discovery of class-specific patterns in input data.
arXiv Detail & Related papers (2024-02-18T23:01:28Z) - Rethinking interpretation: Input-agnostic saliency mapping of deep
visual classifiers [28.28834523468462]
Saliency methods provide post-hoc model interpretation by attributing input features to the model outputs.
We show that input-specific saliency mapping is intrinsically susceptible to misleading feature attribution.
We introduce a new perspective of input-agnostic saliency mapping that computationally estimates the high-level features attributed by the model to its outputs.
arXiv Detail & Related papers (2023-03-31T06:58:45Z) - An Additive Instance-Wise Approach to Multi-class Model Interpretation [53.87578024052922]
Interpretable machine learning offers insights into what factors drive a certain prediction of a black-box system.
Existing methods mainly focus on selecting explanatory input features, which follow either locally additive or instance-wise approaches.
This work exploits the strengths of both methods and proposes a global framework for learning local explanations simultaneously for multiple target classes.
arXiv Detail & Related papers (2022-07-07T06:50:27Z) - A Unified Understanding of Deep NLP Models for Text Classification [88.35418976241057]
We have developed a visual analysis tool, DeepNLPVis, to enable a unified understanding of NLP models for text classification.
The key idea is a mutual information-based measure, which provides quantitative explanations on how each layer of a model maintains the information of input words in a sample.
A multi-level visualization, which consists of a corpus-level, a sample-level, and a word-level visualization, supports the analysis from the overall training set to individual samples.
arXiv Detail & Related papers (2022-06-19T08:55:07Z) - Towards Open-World Feature Extrapolation: An Inductive Graph Learning
Approach [80.8446673089281]
We propose a new learning paradigm with graph representation and learning.
Our framework contains two modules: 1) a backbone network (e.g., feedforward neural nets) as a lower model takes features as input and outputs predicted labels; 2) a graph neural network as an upper model learns to extrapolate embeddings for new features via message passing over a feature-data graph built from observed data.
arXiv Detail & Related papers (2021-10-09T09:02:45Z) - Layer-wise Analysis of a Self-supervised Speech Representation Model [26.727775920272205]
Self-supervised learning approaches have been successful for pre-training speech representation models.
Not much has been studied about the type or extent of information encoded in the pre-trained representations themselves.
arXiv Detail & Related papers (2021-07-10T02:13:25Z) - Generative Counterfactuals for Neural Networks via Attribute-Informed
Perturbation [51.29486247405601]
We design a framework to generate counterfactuals for raw data instances with the proposed Attribute-Informed Perturbation (AIP)
By utilizing generative models conditioned with different attributes, counterfactuals with desired labels can be obtained effectively and efficiently.
Experimental results on real-world texts and images demonstrate the effectiveness, sample quality as well as efficiency of our designed framework.
arXiv Detail & Related papers (2021-01-18T08:37:13Z) - DoLFIn: Distributions over Latent Features for Interpretability [8.807587076209568]
We propose a novel strategy for achieving interpretability in neural network models.
Our approach builds on the success of using probability as the central quantity.
We show that DoLFIn not only provides interpretable solutions, but even slightly outperforms the classical CNN and BiLSTM text classification.
arXiv Detail & Related papers (2020-11-10T18:32:53Z) - A Trainable Optimal Transport Embedding for Feature Aggregation and its
Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference.
Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z) - Adversarial Infidelity Learning for Model Interpretation [43.37354056251584]
We propose a Model-agnostic Effective Efficient Direct (MEED) IFS framework for model interpretation.
Our framework mitigates concerns about sanity, shortcuts, model identifiability, and information transmission.
Our AIL mechanism can help learn the desired conditional distribution between selected features and targets.
arXiv Detail & Related papers (2020-06-09T16:27:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.