Generic Attention-model Explainability by Weighted Relevance
Accumulation
- URL: http://arxiv.org/abs/2308.10240v1
- Date: Sun, 20 Aug 2023 12:02:30 GMT
- Title: Generic Attention-model Explainability by Weighted Relevance
Accumulation
- Authors: Yiming Huang, Aozhe Jia, Xiaodan Zhang, Jiawei Zhang
- Abstract summary: We propose a weighted relevancy strategy, which takes the importance of token values into consideration, to reduce distortion when equally accumulating relevance.
To evaluate our method, we propose a unified CLIP-based two-stage model, named CLIPmapper, to process Vision-and-Language tasks.
- Score: 9.816810016935541
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Attention-based transformer models have achieved remarkable progress in
multi-modal tasks, such as visual question answering. The explainability of
attention-based methods has recently attracted wide interest as it can explain
the inner changes of attention tokens by accumulating relevancy across
attention layers. Current methods simply update relevancy by equally
accumulating the token relevancy before and after the attention processes.
However, the importance of token values is usually different during relevance
accumulation. In this paper, we propose a weighted relevancy strategy, which
takes the importance of token values into consideration, to reduce distortion
when equally accumulating relevance. To evaluate our method, we propose a
unified CLIP-based two-stage model, named CLIPmapper, to process
Vision-and-Language tasks through CLIP encoder and a following mapper.
CLIPmapper consists of self-attention, cross-attention, single-modality, and
cross-modality attention, thus it is more suitable for evaluating our generic
explainability method. Extensive perturbation tests on visual question
answering and image captioning validate that our explainability method
outperforms existing methods.
Related papers
- Recycled Attention: Efficient inference for long-context language models [54.00118604124301]
We propose Recycled Attention, an inference-time method which alternates between full context attention and attention over a subset of input tokens.
When performing partial attention, we recycle the attention pattern of a previous token that has performed full attention and attend only to the top K most attended tokens.
Compared to previously proposed inference-time acceleration method which attends only to local context or tokens with high accumulative attention scores, our approach flexibly chooses tokens that are relevant to the current decoding step.
arXiv Detail & Related papers (2024-11-08T18:57:07Z) - Elliptical Attention [1.7597562616011944]
Pairwise dot-product self-attention is key to the success of transformers that achieve state-of-the-art performance across a variety of applications in language and vision.
We propose using a Mahalanobis distance metric for computing the attention weights to stretch the underlying feature space in directions of high contextual relevance.
arXiv Detail & Related papers (2024-06-19T18:38:11Z) - Fortify the Shortest Stave in Attention: Enhancing Context Awareness of Large Language Models for Effective Tool Use [74.72150542395487]
An inherent waveform pattern in the attention allocation of large language models (LLMs) significantly affects their performance in tasks demanding a high degree of context awareness.
To address this issue, we propose a novel inference method named Attention Buckets.
arXiv Detail & Related papers (2023-12-07T17:24:51Z) - Alignment Attention by Matching Key and Query Distributions [48.93793773929006]
This paper introduces alignment attention that explicitly encourages self-attention to match the distributions of the key and query within each head.
It is simple to convert any models with self-attention, including pre-trained ones, to the proposed alignment attention.
On a variety of language understanding tasks, we show the effectiveness of our method in accuracy, uncertainty estimation, generalization across domains, and robustness to adversarial attacks.
arXiv Detail & Related papers (2021-10-25T00:54:57Z) - Counterfactual Attention Learning for Fine-Grained Visual Categorization
and Re-identification [101.49122450005869]
We present a counterfactual attention learning method to learn more effective attention based on causal inference.
Specifically, we analyze the effect of the learned visual attention on network prediction.
We evaluate our method on a wide range of fine-grained recognition tasks.
arXiv Detail & Related papers (2021-08-19T14:53:40Z) - More Than Just Attention: Learning Cross-Modal Attentions with
Contrastive Constraints [63.08768589044052]
We propose Contrastive Content Re-sourcing ( CCR) and Contrastive Content Swapping ( CCS) constraints to address such limitation.
CCR and CCS constraints supervise the training of attention models in a contrastive learning manner without requiring explicit attention annotations.
Experiments on both Flickr30k and MS-COCO datasets demonstrate that integrating these attention constraints into two state-of-the-art attention-based models improves the model performance.
arXiv Detail & Related papers (2021-05-20T08:48:10Z) - Revisiting The Evaluation of Class Activation Mapping for
Explainability: A Novel Metric and Experimental Analysis [54.94682858474711]
Class Activation Mapping (CAM) approaches provide an effective visualization by taking weighted averages of the activation maps.
We propose a novel set of metrics to quantify explanation maps, which show better effectiveness and simplify comparisons between approaches.
arXiv Detail & Related papers (2021-04-20T21:34:24Z) - SparseBERT: Rethinking the Importance Analysis in Self-attention [107.68072039537311]
Transformer-based models are popular for natural language processing (NLP) tasks due to its powerful capacity.
Attention map visualization of a pre-trained model is one direct method for understanding self-attention mechanism.
We propose a Differentiable Attention Mask (DAM) algorithm, which can be also applied in guidance of SparseBERT design.
arXiv Detail & Related papers (2021-02-25T14:13:44Z) - Quantifying Attention Flow in Transformers [12.197250533100283]
"self-attention" combines information from attended embeddings into the representation of the focal embedding in the next layer.
This makes attention weights unreliable as explanations probes.
We propose two methods for approximating the attention to input tokens given attention weights, attention rollout and attention flow.
arXiv Detail & Related papers (2020-05-02T21:45:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.