Attention vs non-attention for a Shapley-based explanation method
- URL: http://arxiv.org/abs/2104.12424v1
- Date: Mon, 26 Apr 2021 09:33:18 GMT
- Title: Attention vs non-attention for a Shapley-based explanation method
- Authors: Tom Kersten, Hugh Mee Wong, Jaap Jumelet, Dieuwke Hupkes
- Abstract summary: We consider Contextual Decomposition (CD) -- a Shapley-based input feature attribution method that has been shown to work well for recurrent NLP models.
We show that the English and Dutch models demonstrate similar processing behaviour, but that under the hood there are consistent differences between our attention and non-attention models.
- Score: 6.386917828177479
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The field of explainable AI has recently seen an explosion in the number of
explanation methods for highly non-linear deep neural networks. The extent to
which such methods -- that are often proposed and tested in the domain of
computer vision -- are appropriate to address the explainability challenges in
NLP is yet relatively unexplored. In this work, we consider Contextual
Decomposition (CD) -- a Shapley-based input feature attribution method that has
been shown to work well for recurrent NLP models -- and we test the extent to
which it is useful for models that contain attention operations. To this end,
we extend CD to cover the operations necessary for attention-based models. We
then compare how long distance subject-verb relationships are processed by
models with and without attention, considering a number of different syntactic
structures in two different languages: English and Dutch. Our experiments
confirm that CD can successfully be applied for attention-based models as well,
providing an alternative Shapley-based attribution method for modern neural
networks. In particular, using CD, we show that the English and Dutch models
demonstrate similar processing behaviour, but that under the hood there are
consistent differences between our attention and non-attention models.
Related papers
- Sparks of Explainability: Recent Advancements in Explaining Large Vision Models [6.1642231492615345]
This thesis explores advanced approaches to improve explainability in computer vision by analyzing and modeling the features exploited by deep neural networks.
It evaluates attribution methods, notably saliency maps, by introducing a metric based on algorithmic stability and an approach utilizing Sobol indices.
Two hypotheses are examined: aligning models with human reasoning and adopting a conceptual explainability approach.
arXiv Detail & Related papers (2025-02-03T04:49:32Z) - Attention Entropy is a Key Factor: An Analysis of Parallel Context Encoding with Full-attention-based Pre-trained Language Models [49.84163262868945]
Large language models have shown remarkable performance across a wide range of language tasks, owing to their exceptional capabilities in context modeling.
The most commonly used method of context modeling is full self-attention, as seen in standard decoder-only Transformers.
We propose parallel context encoding, which splits the context into sub-pieces and encodes them parallelly.
arXiv Detail & Related papers (2024-12-21T09:04:51Z) - DISCO: DISCovering Overfittings as Causal Rules for Text Classification Models [6.369258625916601]
Post-hoc interpretability methods fail to capture the models' decision-making process fully.
Our paper introduces DISCO, a novel method for discovering global, rule-based explanations.
DISCO supports interactive explanations, enabling human inspectors to distinguish spurious causes in the rule-based output.
arXiv Detail & Related papers (2024-11-07T12:12:44Z) - A Survey of Low-shot Vision-Language Model Adaptation via Representer Theorem [38.84662767814454]
Key challenge to address under the condition of limited training data is how to fine-tune pre-trained vision-language models in a parameter-efficient manner.
This paper proposes a unified computational framework to integrate existing methods together, identify their nature and support in-depth comparison.
As a demonstration, we extend existing methods by modeling inter-class correlation between representers in reproducing kernel Hilbert space (RKHS)
arXiv Detail & Related papers (2024-10-15T15:22:30Z) - Unsupervised Model Diagnosis [49.36194740479798]
This paper proposes Unsupervised Model Diagnosis (UMO) to produce semantic counterfactual explanations without any user guidance.
Our approach identifies and visualizes changes in semantics, and then matches these changes to attributes from wide-ranging text sources.
arXiv Detail & Related papers (2024-10-08T17:59:03Z) - Automatic Discovery of Visual Circuits [66.99553804855931]
We explore scalable methods for extracting the subgraph of a vision model's computational graph that underlies recognition of a specific visual concept.
We find that our approach extracts circuits that causally affect model output, and that editing these circuits can defend large pretrained models from adversarial attacks.
arXiv Detail & Related papers (2024-04-22T17:00:57Z) - Manipulating Feature Visualizations with Gradient Slingshots [54.31109240020007]
We introduce a novel method for manipulating Feature Visualization (FV) without significantly impacting the model's decision-making process.
We evaluate the effectiveness of our method on several neural network models and demonstrate its capabilities to hide the functionality of arbitrarily chosen neurons.
arXiv Detail & Related papers (2024-01-11T18:57:17Z) - Feature Interactions Reveal Linguistic Structure in Language Models [2.0178765779788495]
We study feature interactions in the context of feature attribution methods for post-hoc interpretability.
We work out a grey box methodology, in which we train models to perfection on a formal language classification task.
We show that under specific configurations, some methods are indeed able to uncover the grammatical rules acquired by a model.
arXiv Detail & Related papers (2023-06-21T11:24:41Z) - Entity-Conditioned Question Generation for Robust Attention Distribution
in Neural Information Retrieval [51.53892300802014]
We show that supervised neural information retrieval models are prone to learning sparse attention patterns over passage tokens.
Using a novel targeted synthetic data generation method, we teach neural IR to attend more uniformly and robustly to all entities in a given passage.
arXiv Detail & Related papers (2022-04-24T22:36:48Z) - This looks more like that: Enhancing Self-Explaining Models by
Prototypical Relevance Propagation [17.485732906337507]
We present a case study of the self-explaining network, ProtoPNet, in the presence of a spectrum of artifacts.
We introduce a novel method for generating more precise model-aware explanations.
In order to obtain a clean dataset, we propose to use multi-view clustering strategies for segregating the artifact images.
arXiv Detail & Related papers (2021-08-27T09:55:53Z) - Bayesian Attention Belief Networks [59.183311769616466]
Attention-based neural networks have achieved state-of-the-art results on a wide range of tasks.
This paper introduces Bayesian attention belief networks, which construct a decoder network by modeling unnormalized attention weights.
We show that our method outperforms deterministic attention and state-of-the-art attention in accuracy, uncertainty estimation, generalization across domains, and adversarial attacks.
arXiv Detail & Related papers (2021-06-09T17:46:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.