DoLFIn: Distributions over Latent Features for Interpretability
- URL: http://arxiv.org/abs/2011.05295v1
- Date: Tue, 10 Nov 2020 18:32:53 GMT
- Title: DoLFIn: Distributions over Latent Features for Interpretability
- Authors: Phong Le and Willem Zuidema
- Abstract summary: We propose a novel strategy for achieving interpretability in neural network models.
Our approach builds on the success of using probability as the central quantity.
We show that DoLFIn not only provides interpretable solutions, but even slightly outperforms the classical CNN and BiLSTM text classification.
- Score: 8.807587076209568
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Interpreting the inner workings of neural models is a key step in ensuring
the robustness and trustworthiness of the models, but work on neural network
interpretability typically faces a trade-off: either the models are too
constrained to be very useful, or the solutions found by the models are too
complex to interpret. We propose a novel strategy for achieving
interpretability that -- in our experiments -- avoids this trade-off. Our
approach builds on the success of using probability as the central quantity,
such as for instance within the attention mechanism. In our architecture,
DoLFIn (Distributions over Latent Features for Interpretability), we do no
determine beforehand what each feature represents, and features go altogether
into an unordered set. Each feature has an associated probability ranging from
0 to 1, weighing its importance for further processing. We show that, unlike
attention and saliency map approaches, this set-up makes it straight-forward to
compute the probability with which an input component supports the decision the
neural model makes. To demonstrate the usefulness of the approach, we apply
DoLFIn to text classification, and show that DoLFIn not only provides
interpretable solutions, but even slightly outperforms the classical CNN and
BiLSTM text classifiers on the SST2 and AG-news datasets.
Related papers
- CF-GO-Net: A Universal Distribution Learner via Characteristic Function Networks with Graph Optimizers [8.816637789605174]
We introduce an approach which employs the characteristic function (CF), a probabilistic descriptor that directly corresponds to the distribution.
Unlike the probability density function (pdf), the characteristic function not only always exists, but also provides an additional degree of freedom.
Our method allows the use of a pre-trained model, such as a well-trained autoencoder, and is capable of learning directly in its feature space.
arXiv Detail & Related papers (2024-09-19T09:33:12Z) - InterpretCC: Intrinsic User-Centric Interpretability through Global Mixture of Experts [31.738009841932374]
Interpretability for neural networks is a trade-off between three key requirements.
We present InterpretCC, a family of interpretable-by-design neural networks that guarantee human-centric interpretability.
arXiv Detail & Related papers (2024-02-05T11:55:50Z) - Equivariance with Learned Canonicalization Functions [77.32483958400282]
We show that learning a small neural network to perform canonicalization is better than using predefineds.
Our experiments show that learning the canonicalization function is competitive with existing techniques for learning equivariant functions across many tasks.
arXiv Detail & Related papers (2022-11-11T21:58:15Z) - Towards Open-World Feature Extrapolation: An Inductive Graph Learning
Approach [80.8446673089281]
We propose a new learning paradigm with graph representation and learning.
Our framework contains two modules: 1) a backbone network (e.g., feedforward neural nets) as a lower model takes features as input and outputs predicted labels; 2) a graph neural network as an upper model learns to extrapolate embeddings for new features via message passing over a feature-data graph built from observed data.
arXiv Detail & Related papers (2021-10-09T09:02:45Z) - Combining Discrete Choice Models and Neural Networks through Embeddings:
Formulation, Interpretability and Performance [10.57079240576682]
This study proposes a novel approach that combines theory and data-driven choice models using Artificial Neural Networks (ANNs)
In particular, we use continuous vector representations, called embeddings, for encoding categorical or discrete explanatory variables.
Our models deliver state-of-the-art predictive performance, outperforming existing ANN-based models while drastically reducing the number of required network parameters.
arXiv Detail & Related papers (2021-09-24T15:55:31Z) - It's FLAN time! Summing feature-wise latent representations for
interpretability [0.0]
We propose a novel class of structurally-constrained neural networks, which we call FLANs (Feature-wise Latent Additive Networks)
FLANs process each input feature separately, computing for each of them a representation in a common latent space.
These feature-wise latent representations are then simply summed, and the aggregated representation is used for prediction.
arXiv Detail & Related papers (2021-06-18T12:19:33Z) - Improving Uncertainty Calibration via Prior Augmented Data [56.88185136509654]
Neural networks have proven successful at learning from complex data distributions by acting as universal function approximators.
They are often overconfident in their predictions, which leads to inaccurate and miscalibrated probabilistic predictions.
We propose a solution by seeking out regions of feature space where the model is unjustifiably overconfident, and conditionally raising the entropy of those predictions towards that of the prior distribution of the labels.
arXiv Detail & Related papers (2021-02-22T07:02:37Z) - Generative Counterfactuals for Neural Networks via Attribute-Informed
Perturbation [51.29486247405601]
We design a framework to generate counterfactuals for raw data instances with the proposed Attribute-Informed Perturbation (AIP)
By utilizing generative models conditioned with different attributes, counterfactuals with desired labels can be obtained effectively and efficiently.
Experimental results on real-world texts and images demonstrate the effectiveness, sample quality as well as efficiency of our designed framework.
arXiv Detail & Related papers (2021-01-18T08:37:13Z) - A Framework to Learn with Interpretation [2.3741312212138896]
We present a novel framework to jointly learn a predictive model and its associated interpretation model.
We seek for a small-size dictionary of high level attribute functions that take as inputs the outputs of selected hidden layers.
A detailed pipeline to visualize the learnt features is also developed.
arXiv Detail & Related papers (2020-10-19T09:26:28Z) - Explaining and Improving Model Behavior with k Nearest Neighbor
Representations [107.24850861390196]
We propose using k nearest neighbor representations to identify training examples responsible for a model's predictions.
We show that kNN representations are effective at uncovering learned spurious associations.
Our results indicate that the kNN approach makes the finetuned model more robust to adversarial inputs.
arXiv Detail & Related papers (2020-10-18T16:55:25Z) - Interpreting Graph Neural Networks for NLP With Differentiable Edge
Masking [63.49779304362376]
Graph neural networks (GNNs) have become a popular approach to integrating structural inductive biases into NLP models.
We introduce a post-hoc method for interpreting the predictions of GNNs which identifies unnecessary edges.
We show that we can drop a large proportion of edges without deteriorating the performance of the model.
arXiv Detail & Related papers (2020-10-01T17:51:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.