Related papers: A Framework to Learn with Interpretation

A Framework to Learn with Interpretation

URL: http://arxiv.org/abs/2010.09345v4
Date: Wed, 23 Feb 2022 13:29:44 GMT
Title: A Framework to Learn with Interpretation
Authors: Jayneel Parekh, Pavlo Mozharovskyi, Florence d'Alch\'e-Buc
Abstract summary: We present a novel framework to jointly learn a predictive model and its associated interpretation model. We seek for a small-size dictionary of high level attribute functions that take as inputs the outputs of selected hidden layers. A detailed pipeline to visualize the learnt features is also developed.
Score: 2.3741312212138896
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: To tackle interpretability in deep learning, we present a novel framework to jointly learn a predictive model and its associated interpretation model. The interpreter provides both local and global interpretability about the predictive model in terms of human-understandable high level attribute functions, with minimal loss of accuracy. This is achieved by a dedicated architecture and well chosen regularization penalties. We seek for a small-size dictionary of high level attribute functions that take as inputs the outputs of selected hidden layers and whose outputs feed a linear classifier. We impose strong conciseness on the activation of attributes with an entropy-based criterion while enforcing fidelity to both inputs and outputs of the predictive model. A detailed pipeline to visualize the learnt features is also developed. Moreover, besides generating interpretable models by design, our approach can be specialized to provide post-hoc interpretations for a pre-trained neural network. We validate our approach against several state-of-the-art methods on multiple datasets and show its efficacy on both kinds of tasks.

Related papers

Prospector Heads: Generalized Feature Attribution for Large Models & Data [82.02696069543454]
We introduce prospector heads, an efficient and interpretable alternative to explanation-based attribution methods. We demonstrate how prospector heads enable improved interpretation and discovery of class-specific patterns in input data.
arXiv Detail & Related papers (2024-02-18T23:01:28Z)
Rethinking interpretation: Input-agnostic saliency mapping of deep visual classifiers [28.28834523468462]
Saliency methods provide post-hoc model interpretation by attributing input features to the model outputs. We show that input-specific saliency mapping is intrinsically susceptible to misleading feature attribution. We introduce a new perspective of input-agnostic saliency mapping that computationally estimates the high-level features attributed by the model to its outputs.
arXiv Detail & Related papers (2023-03-31T06:58:45Z)
An Additive Instance-Wise Approach to Multi-class Model Interpretation [53.87578024052922]
Interpretable machine learning offers insights into what factors drive a certain prediction of a black-box system. Existing methods mainly focus on selecting explanatory input features, which follow either locally additive or instance-wise approaches. This work exploits the strengths of both methods and proposes a global framework for learning local explanations simultaneously for multiple target classes.
arXiv Detail & Related papers (2022-07-07T06:50:27Z)
A Unified Understanding of Deep NLP Models for Text Classification [88.35418976241057]
We have developed a visual analysis tool, DeepNLPVis, to enable a unified understanding of NLP models for text classification. The key idea is a mutual information-based measure, which provides quantitative explanations on how each layer of a model maintains the information of input words in a sample. A multi-level visualization, which consists of a corpus-level, a sample-level, and a word-level visualization, supports the analysis from the overall training set to individual samples.
arXiv Detail & Related papers (2022-06-19T08:55:07Z)
Towards Open-World Feature Extrapolation: An Inductive Graph Learning Approach [80.8446673089281]
We propose a new learning paradigm with graph representation and learning. Our framework contains two modules: 1) a backbone network (e.g., feedforward neural nets) as a lower model takes features as input and outputs predicted labels; 2) a graph neural network as an upper model learns to extrapolate embeddings for new features via message passing over a feature-data graph built from observed data.
arXiv Detail & Related papers (2021-10-09T09:02:45Z)
Layer-wise Analysis of a Self-supervised Speech Representation Model [26.727775920272205]
Self-supervised learning approaches have been successful for pre-training speech representation models. Not much has been studied about the type or extent of information encoded in the pre-trained representations themselves.
arXiv Detail & Related papers (2021-07-10T02:13:25Z)
Generative Counterfactuals for Neural Networks via Attribute-Informed Perturbation [51.29486247405601]
We design a framework to generate counterfactuals for raw data instances with the proposed Attribute-Informed Perturbation (AIP) By utilizing generative models conditioned with different attributes, counterfactuals with desired labels can be obtained effectively and efficiently. Experimental results on real-world texts and images demonstrate the effectiveness, sample quality as well as efficiency of our designed framework.
arXiv Detail & Related papers (2021-01-18T08:37:13Z)
DoLFIn: Distributions over Latent Features for Interpretability [8.807587076209568]
We propose a novel strategy for achieving interpretability in neural network models. Our approach builds on the success of using probability as the central quantity. We show that DoLFIn not only provides interpretable solutions, but even slightly outperforms the classical CNN and BiLSTM text classification.
arXiv Detail & Related papers (2020-11-10T18:32:53Z)
A Trainable Optimal Transport Embedding for Feature Aggregation and its Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference. Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z)
Adversarial Infidelity Learning for Model Interpretation [43.37354056251584]
We propose a Model-agnostic Effective Efficient Direct (MEED) IFS framework for model interpretation. Our framework mitigates concerns about sanity, shortcuts, model identifiability, and information transmission. Our AIL mechanism can help learn the desired conditional distribution between selected features and targets.
arXiv Detail & Related papers (2020-06-09T16:27:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.