COCKATIEL: COntinuous Concept ranKed ATtribution with Interpretable
ELements for explaining neural net classifiers on NLP tasks
- URL: http://arxiv.org/abs/2305.06754v2
- Date: Sun, 14 May 2023 14:38:41 GMT
- Title: COCKATIEL: COntinuous Concept ranKed ATtribution with Interpretable
ELements for explaining neural net classifiers on NLP tasks
- Authors: Fanny Jourdan, Agustin Picard, Thomas Fel, Laurent Risser, Jean Michel
Loubes, Nicholas Asher
- Abstract summary: COCKATIEL is a novel, post-hoc, concept-based, model-agnostic XAI technique.
It generates meaningful explanations from the last layer of a neural net model trained on an NLP classification task.
It does so without compromising the accuracy of the underlying model or requiring a new one to be trained.
- Score: 3.475906200620518
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformer architectures are complex and their use in NLP, while it has
engendered many successes, makes their interpretability or explainability
challenging. Recent debates have shown that attention maps and attribution
methods are unreliable (Pruthi et al., 2019; Brunner et al., 2019). In this
paper, we present some of their limitations and introduce COCKATIEL, which
successfully addresses some of them. COCKATIEL is a novel, post-hoc,
concept-based, model-agnostic XAI technique that generates meaningful
explanations from the last layer of a neural net model trained on an NLP
classification task by using Non-Negative Matrix Factorization (NMF) to
discover the concepts the model leverages to make predictions and by exploiting
a Sensitivity Analysis to estimate accurately the importance of each of these
concepts for the model. It does so without compromising the accuracy of the
underlying model or requiring a new one to be trained. We conduct experiments
in single and multi-aspect sentiment analysis tasks and we show COCKATIEL's
superior ability to discover concepts that align with humans' on Transformer
models without any supervision, we objectively verify the faithfulness of its
explanations through fidelity metrics, and we showcase its ability to provide
meaningful explanations in two different datasets.
Related papers
- CAT: Interpretable Concept-based Taylor Additive Models [17.73885202930879]
Generalized Additive Models (GAMs) can explain deep neural networks (DNNs) at the feature level.
GAMs require large numbers of model parameters and are prone to overfitting, making them hard to train and scale.
We propose CAT, a novel interpretable Concept-bAsed Taylor additive model to simply this process.
arXiv Detail & Related papers (2024-06-25T20:43:15Z) - Manipulating Feature Visualizations with Gradient Slingshots [54.31109240020007]
We introduce a novel method for manipulating Feature Visualization (FV) without significantly impacting the model's decision-making process.
We evaluate the effectiveness of our method on several neural network models and demonstrate its capabilities to hide the functionality of arbitrarily chosen neurons.
arXiv Detail & Related papers (2024-01-11T18:57:17Z) - Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks.
The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation.
We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z) - NxPlain: Web-based Tool for Discovery of Latent Concepts [16.446370662629555]
We present NxPlain, a web application that provides an explanation of a model's prediction using latent concepts.
NxPlain discovers latent concepts learned in a deep NLP model, provides an interpretation of the knowledge learned in the model, and explains its predictions based on the used concepts.
arXiv Detail & Related papers (2023-03-06T10:45:24Z) - Provable concept learning for interpretable predictions using
variational inference [7.0349768355860895]
In safety critical applications, practitioners are reluctant to trust neural networks when no interpretable explanations are available.
We propose a probabilistic modeling framework to derive (C)oncept (L)earning and (P)rediction (CLAP)
We prove that our method is able to identify them while attaining optimal classification accuracy.
arXiv Detail & Related papers (2022-04-01T14:51:38Z) - Correcting Classification: A Bayesian Framework Using Explanation
Feedback to Improve Classification Abilities [2.0931163605360115]
Explanations are social, meaning they are a transfer of knowledge through interactions.
We overcome these difficulties by training a Bayesian convolutional neural network (CNN) that uses explanation feedback.
Our proposed method utilizes this feedback for fine-tuning to correct the model such that the explanations and classifications improve.
arXiv Detail & Related papers (2021-04-29T13:59:21Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - Explaining and Improving Model Behavior with k Nearest Neighbor
Representations [107.24850861390196]
We propose using k nearest neighbor representations to identify training examples responsible for a model's predictions.
We show that kNN representations are effective at uncovering learned spurious associations.
Our results indicate that the kNN approach makes the finetuned model more robust to adversarial inputs.
arXiv Detail & Related papers (2020-10-18T16:55:25Z) - Modeling Token-level Uncertainty to Learn Unknown Concepts in SLU via
Calibrated Dirichlet Prior RNN [98.4713940310056]
One major task of spoken language understanding (SLU) in modern personal assistants is to extract semantic concepts from an utterance.
Recent research collected question and answer annotated data to learn what is unknown and should be asked.
We incorporate softmax-based slot filling neural architectures to model the sequence uncertainty without question supervision.
arXiv Detail & Related papers (2020-10-16T02:12:30Z) - Interpreting Graph Neural Networks for NLP With Differentiable Edge
Masking [63.49779304362376]
Graph neural networks (GNNs) have become a popular approach to integrating structural inductive biases into NLP models.
We introduce a post-hoc method for interpreting the predictions of GNNs which identifies unnecessary edges.
We show that we can drop a large proportion of edges without deteriorating the performance of the model.
arXiv Detail & Related papers (2020-10-01T17:51:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.