Improving Attention Mechanism with Query-Value Interaction
- URL: http://arxiv.org/abs/2010.03766v1
- Date: Thu, 8 Oct 2020 05:12:52 GMT
- Title: Improving Attention Mechanism with Query-Value Interaction
- Authors: Chuhan Wu, Fangzhao Wu, Tao Qi, Yongfeng Huang
- Abstract summary: We propose a query-value interaction function which can learn query-aware attention values.
Our approach can consistently improve the performance of many attention-based models.
- Score: 92.67156911466397
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Attention mechanism has played critical roles in various state-of-the-art NLP
models such as Transformer and BERT. It can be formulated as a ternary function
that maps the input queries, keys and values into an output by using a
summation of values weighted by the attention weights derived from the
interactions between queries and keys. Similar with query-key interactions,
there is also inherent relatedness between queries and values, and
incorporating query-value interactions has the potential to enhance the output
by learning customized values according to the characteristics of queries.
However, the query-value interactions are ignored by existing attention
methods, which may be not optimal. In this paper, we propose to improve the
existing attention mechanism by incorporating query-value interactions. We
propose a query-value interaction function which can learn query-aware
attention values, and combine them with the original values and attention
weights to form the final output. Extensive experiments on four datasets for
different tasks show that our approach can consistently improve the performance
of many attention-based models by incorporating query-value interactions.
Related papers
- Interactive Multi-Head Self-Attention with Linear Complexity [60.112941134420204]
We show that the interactions between cross-heads of the attention matrix enhance the information flow of the attention operation.
We propose an effective method to decompose the attention operation into query- and key-less components.
arXiv Detail & Related papers (2024-02-27T13:47:23Z) - JPAVE: A Generation and Classification-based Model for Joint Product
Attribute Prediction and Value Extraction [59.94977231327573]
We propose a multi-task learning model with value generation/classification and attribute prediction called JPAVE.
Two variants of our model are designed for open-world and closed-world scenarios.
Experimental results on a public dataset demonstrate the superiority of our model compared with strong baselines.
arXiv Detail & Related papers (2023-11-07T18:36:16Z) - A Question-Answering Approach to Key Value Pair Extraction from
Form-like Document Images [8.73248722579337]
We present a new question-answering (QA) based key-value pair extraction approach, called KVPFormer.
We propose a coarse-to-fine answer prediction approach to achieve higher answer prediction accuracy.
Our proposed Ours achieves state-of-the-art results on FUNSD and XFUND datasets, outperforming the previous best-performing method by 7.2% and 13.2% in F1 score, respectively.
arXiv Detail & Related papers (2023-04-17T02:55:31Z) - Query-Utterance Attention with Joint modeling for Query-Focused Meeting
Summarization [4.763356598070365]
We propose a query-aware framework with joint modeling token and utterance based on Query-Utterance Attention.
We show that the query relevance of different granularities contributes to generating a summary more related to the query.
arXiv Detail & Related papers (2023-03-08T10:21:45Z) - Compositional Attention: Disentangling Search and Retrieval [66.7108739597771]
Multi-head, key-value attention is the backbone of the Transformer model and its variants.
Standard attention heads learn a rigid mapping between search and retrieval.
We propose a novel attention mechanism, called Compositional Attention, that replaces the standard head structure.
arXiv Detail & Related papers (2021-10-18T15:47:38Z) - Relation-aware Heterogeneous Graph for User Profiling [24.076585294260816]
We propose to leverage the relation-aware heterogeneous graph method for user profiling.
We adopt the query, key, and value mechanism in a transformer fashion for heterogeneous message passing.
We conduct experiments on two real-world e-commerce datasets and observe a significant performance boost of our approach.
arXiv Detail & Related papers (2021-10-14T06:59:30Z) - Neural Graph Matching based Collaborative Filtering [13.086302251856756]
We identify two different types of attribute interactions, inner and cross interactions.
Existing models do not distinguish these two types of attribute interactions.
We propose a neural Graph Matching based Collaborative Filtering model (GMCF)
Our model outperforms state-of-the-art models.
arXiv Detail & Related papers (2021-05-10T01:51:46Z) - DCR-Net: A Deep Co-Interactive Relation Network for Joint Dialog Act
Recognition and Sentiment Classification [77.59549450705384]
In dialog system, dialog act recognition and sentiment classification are two correlative tasks.
Most of the existing systems either treat them as separate tasks or just jointly model the two tasks.
We propose a Deep Co-Interactive Relation Network (DCR-Net) to explicitly consider the cross-impact and model the interaction between the two tasks.
arXiv Detail & Related papers (2020-08-16T14:13:32Z) - Self-Attention Attribution: Interpreting Information Interactions Inside
Transformer [89.21584915290319]
We propose a self-attention attribution method to interpret the information interactions inside Transformer.
We show that the attribution results can be used as adversarial patterns to implement non-targeted attacks towards BERT.
arXiv Detail & Related papers (2020-04-23T14:58:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.