Self-Attention Attribution: Interpreting Information Interactions Inside
Transformer
- URL: http://arxiv.org/abs/2004.11207v2
- Date: Thu, 25 Feb 2021 10:53:13 GMT
- Title: Self-Attention Attribution: Interpreting Information Interactions Inside
Transformer
- Authors: Yaru Hao, Li Dong, Furu Wei, Ke Xu
- Abstract summary: We propose a self-attention attribution method to interpret the information interactions inside Transformer.
We show that the attribution results can be used as adversarial patterns to implement non-targeted attacks towards BERT.
- Score: 89.21584915290319
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The great success of Transformer-based models benefits from the powerful
multi-head self-attention mechanism, which learns token dependencies and
encodes contextual information from the input. Prior work strives to attribute
model decisions to individual input features with different saliency measures,
but they fail to explain how these input features interact with each other to
reach predictions. In this paper, we propose a self-attention attribution
method to interpret the information interactions inside Transformer. We take
BERT as an example to conduct extensive studies. Firstly, we apply
self-attention attribution to identify the important attention heads, while
others can be pruned with marginal performance degradation. Furthermore, we
extract the most salient dependencies in each layer to construct an attribution
tree, which reveals the hierarchical interactions inside Transformer. Finally,
we show that the attribution results can be used as adversarial patterns to
implement non-targeted attacks towards BERT.
Related papers
- DAPE V2: Process Attention Score as Feature Map for Length Extrapolation [63.87956583202729]
We conceptualize attention as a feature map and apply the convolution operator to mimic the processing methods in computer vision.
The novel insight, which can be adapted to various attention-related models, reveals that the current Transformer architecture has the potential for further evolution.
arXiv Detail & Related papers (2024-10-07T07:21:49Z) - RAT: Retrieval-Augmented Transformer for Click-Through Rate Prediction [68.34355552090103]
This paper develops a Retrieval-Augmented Transformer (RAT), aiming to acquire fine-grained feature interactions within and across samples.
We then build Transformer layers with cascaded attention to capture both intra- and cross-sample feature interactions.
Experiments on real-world datasets substantiate the effectiveness of RAT and suggest its advantage in long-tail scenarios.
arXiv Detail & Related papers (2024-04-02T19:14:23Z) - ExpPoint-MAE: Better interpretability and performance for self-supervised point cloud transformers [7.725095281624494]
We evaluate the effectiveness of Masked Autoencoding as a pretraining scheme, and explore Momentum Contrast as an alternative.
We observe that the transformer learns to attend to semantically meaningful regions, indicating that pretraining leads to a better understanding of the underlying geometry.
arXiv Detail & Related papers (2023-06-19T09:38:21Z) - Learning Instance-Specific Augmentations by Capturing Local Invariances [62.70897571389785]
InstaAug is a method for automatically learning input-specific augmentations from data.
We empirically demonstrate that InstaAug learns meaningful input-dependent augmentations for a wide range of transformation classes.
arXiv Detail & Related papers (2022-05-31T18:38:06Z) - Measuring the Mixing of Contextual Information in the Transformer [0.19116784879310028]
We consider the whole attention block --multi-head attention, residual connection, and layer normalization-- and define a metric to measure token-to-token interactions.
Then, we aggregate layer-wise interpretations to provide input attribution scores for model predictions.
Experimentally, we show that our method, ALTI, provides faithful explanations and outperforms similar aggregation methods.
arXiv Detail & Related papers (2022-03-08T17:21:27Z) - Enjoy the Salience: Towards Better Transformer-based Faithful
Explanations with Word Salience [9.147707153504117]
We propose an auxiliary loss function for guiding the multi-head attention mechanism during training to be close to salient information extracted using TextRank.
Experiments for explanation faithfulness across five datasets, show that models trained with SaLoss consistently provide more faithful explanations.
We further show that the latter result in higher predictive performance in downstream tasks.
arXiv Detail & Related papers (2021-08-31T11:21:30Z) - Transformers with Competitive Ensembles of Independent Mechanisms [97.93090139318294]
We propose a new Transformer layer which divides the hidden representation and parameters into multiple mechanisms, which only exchange information through attention.
We study TIM on a large-scale BERT model, on the Image Transformer, and on speech enhancement and find evidence for semantically meaningful specialization as well as improved performance.
arXiv Detail & Related papers (2021-02-27T21:48:46Z) - SparseBERT: Rethinking the Importance Analysis in Self-attention [107.68072039537311]
Transformer-based models are popular for natural language processing (NLP) tasks due to its powerful capacity.
Attention map visualization of a pre-trained model is one direct method for understanding self-attention mechanism.
We propose a Differentiable Attention Mask (DAM) algorithm, which can be also applied in guidance of SparseBERT design.
arXiv Detail & Related papers (2021-02-25T14:13:44Z) - Inserting Information Bottlenecks for Attribution in Transformers [46.77580577396633]
We apply information bottlenecks to analyze the attribution of each feature for prediction on a black-box model.
We show the effectiveness of our method in terms of attribution and the ability to provide insight into how information flows through layers.
arXiv Detail & Related papers (2020-12-27T00:35:43Z) - Do Syntax Trees Help Pre-trained Transformers Extract Information? [8.133145094593502]
We study the utility of incorporating dependency trees into pre-trained transformers on information extraction tasks.
We propose and investigate two distinct strategies for incorporating dependency structure.
We find that their performance gains are highly contingent on the availability of human-annotated dependency parses.
arXiv Detail & Related papers (2020-08-20T17:17:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.