Related papers: GlobEnc: Quantifying Global Token Attribution by Incorporating the Whole Encoder Layer in Transformers

GlobEnc: Quantifying Global Token Attribution by Incorporating the Whole Encoder Layer in Transformers

URL: http://arxiv.org/abs/2205.03286v1
Date: Fri, 6 May 2022 15:13:34 GMT
Title: GlobEnc: Quantifying Global Token Attribution by Incorporating the Whole Encoder Layer in Transformers
Authors: Ali Modarressi, Mohsen Fayyaz, Yadollah Yaghoobzadeh, Mohammad Taher Pilehvar
Abstract summary: This paper introduces a novel token attribution analysis method that incorporates all the components in the encoder block and aggregates this throughout layers. Our experiments reveal that incorporating almost every encoder component results in increasingly more accurate analysis in both local and global settings.
Score: 19.642769560417904
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: There has been a growing interest in interpreting the underlying dynamics of Transformers. While self-attention patterns were initially deemed as the primary option, recent studies have shown that integrating other components can yield more accurate explanations. This paper introduces a novel token attribution analysis method that incorporates all the components in the encoder block and aggregates this throughout layers. Through extensive quantitative and qualitative experiments, we demonstrate that our method can produce faithful and meaningful global token attributions. Our experiments reveal that incorporating almost every encoder component results in increasingly more accurate analysis in both local (single layer) and global (the whole model) settings. Our global attribution analysis significantly outperforms previous methods on various tasks regarding correlation with gradient-based saliency scores. Our code is freely available at https://github.com/mohsenfayyaz/GlobEnc.

Related papers

A Theoretical Understanding of Shallow Vision Transformers: Learning, Generalization, and Sample Complexity [71.11795737362459]
ViTs with self-attention modules have recently achieved great empirical success in many tasks. However, theoretical learning generalization analysis is mostly noisy and elusive. This paper provides the first theoretical analysis of a shallow ViT for a classification task.
arXiv Detail & Related papers (2023-02-12T22:12:35Z)
Quantifying Context Mixing in Transformers [13.98583981770322]
Self-attention weights and their transformed variants have been the main source of information for analyzing token-to-token interactions in Transformer-based models. We propose Value Zeroing, a novel context mixing score customized for Transformers that provides us with a deeper understanding of how information is mixed at each encoder layer.
arXiv Detail & Related papers (2023-01-30T15:19:02Z)
Synthetic-to-Real Domain Generalized Semantic Segmentation for 3D Indoor Point Clouds [69.64240235315864]
This paper introduces the synthetic-to-real domain generalization setting to this task. The domain gap between synthetic and real-world point cloud data mainly lies in the different layouts and point patterns. Experiments on the synthetic-to-real benchmark demonstrate that both CINMix and multi-prototypes can narrow the distribution gap.
arXiv Detail & Related papers (2022-12-09T05:07:43Z)
Coalescing Global and Local Information for Procedural Text Understanding [70.10291759879887]
A complete procedural understanding solution should combine three core aspects: local and global views of the inputs, and global view of outputs. In this paper, we propose Coalescing Global and Local InformationCG, a new model that builds entity and time representations. Experiments on a popular procedural text understanding dataset show that our model achieves state-of-the-art results.
arXiv Detail & Related papers (2022-08-26T19:16:32Z)
Exploring and Exploiting Multi-Granularity Representations for Machine Reading Comprehension [13.191437539419681]
We propose a novel approach called Adaptive Bidirectional Attention-Capsule Network (ABA-Net) ABA-Net adaptively exploits the source representations of different levels to the predictor. We set the new state-of-the-art performance on the SQuAD 1.0 dataset.
arXiv Detail & Related papers (2022-08-18T10:14:32Z)
Effective and Interpretable Information Aggregation with Capacity Networks [3.4012007729454807]
Capacity networks generate multiple interpretable intermediate results which can be aggregated in a semantically meaningful space. Our experiments show that implementing this simple inductive bias leads to improvements over different encoder-decoder architectures.
arXiv Detail & Related papers (2022-07-25T09:45:16Z)
Measuring the Mixing of Contextual Information in the Transformer [0.19116784879310028]
We consider the whole attention block --multi-head attention, residual connection, and layer normalization-- and define a metric to measure token-to-token interactions. Then, we aggregate layer-wise interpretations to provide input attribution scores for model predictions. Experimentally, we show that our method, ALTI, provides faithful explanations and outperforms similar aggregation methods.
arXiv Detail & Related papers (2022-03-08T17:21:27Z)
Unifying Global-Local Representations in Salient Object Detection with Transformer [55.23033277636774]
We introduce a new attention-based encoder, vision transformer, into salient object detection. With the global view in very shallow layers, the transformer encoder preserves more local representations. Our method significantly outperforms other FCN-based and transformer-based methods in five benchmarks.
arXiv Detail & Related papers (2021-08-05T17:51:32Z)
Rethinking Global Context in Crowd Counting [70.54184500538338]
A pure transformer is used to extract features with global information from overlapping image patches. Inspired by classification, we add a context token to the input sequence, to facilitate information exchange with tokens corresponding to image patches.
arXiv Detail & Related papers (2021-05-23T12:44:27Z)
UniT: Unified Knowledge Transfer for Any-shot Object Detection and Segmentation [52.487469544343305]
Methods for object detection and segmentation rely on large scale instance-level annotations for training. We propose an intuitive and unified semi-supervised model that is applicable to a range of supervision.
arXiv Detail & Related papers (2020-06-12T22:45:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.