Compositional Attention: Disentangling Search and Retrieval
- URL: http://arxiv.org/abs/2110.09419v1
- Date: Mon, 18 Oct 2021 15:47:38 GMT
- Title: Compositional Attention: Disentangling Search and Retrieval
- Authors: Sarthak Mittal, Sharath Chandra Raparthy, Irina Rish, Yoshua Bengio
and Guillaume Lajoie
- Abstract summary: Multi-head, key-value attention is the backbone of the Transformer model and its variants.
Standard attention heads learn a rigid mapping between search and retrieval.
We propose a novel attention mechanism, called Compositional Attention, that replaces the standard head structure.
- Score: 66.7108739597771
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-head, key-value attention is the backbone of the widely successful
Transformer model and its variants. This attention mechanism uses multiple
parallel key-value attention blocks (called heads), each performing two
fundamental computations: (1) search - selection of a relevant entity from a
set via query-key interactions, and (2) retrieval - extraction of relevant
features from the selected entity via a value matrix. Importantly, standard
attention heads learn a rigid mapping between search and retrieval. In this
work, we first highlight how this static nature of the pairing can potentially:
(a) lead to learning of redundant parameters in certain tasks, and (b) hinder
generalization. To alleviate this problem, we propose a novel attention
mechanism, called Compositional Attention, that replaces the standard head
structure. The proposed mechanism disentangles search and retrieval and
composes them in a dynamic, flexible and context-dependent manner through an
additional soft competition stage between the query-key combination and value
pairing. Through a series of numerical experiments, we show that it outperforms
standard multi-head attention on a variety of tasks, including some
out-of-distribution settings. Through our qualitative analysis, we demonstrate
that Compositional Attention leads to dynamic specialization based on the type
of retrieval needed. Our proposed mechanism generalizes multi-head attention,
allows independent scaling of search and retrieval, and can easily be
implemented in lieu of standard attention heads in any network architecture.
Related papers
- GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning [51.677086019209554]
We propose a Generalized Structural Sparse to capture powerful relationships across modalities for pair-wise similarity learning.
The distance metric delicately encapsulates two formats of diagonal and block-diagonal terms.
Experiments on cross-modal and two extra uni-modal retrieval tasks have validated its superiority and flexibility.
arXiv Detail & Related papers (2024-10-20T03:45:50Z) - Aspect-Oriented Summarization through Query-Focused Extraction [23.62412515574206]
Real users' needs often fall more closely into aspects, broad topics in a dataset the user is interested in rather than specific queries.
We benchmark extractive query-focused training schemes, and propose a contrastive augmentation approach to train the model.
We evaluate on two aspect-oriented datasets and find this approach yields focused summaries, better than those from a generic summarization system.
arXiv Detail & Related papers (2021-10-15T18:06:21Z) - Exposing Query Identification for Search Transparency [69.06545074617685]
We explore the feasibility of approximate exposing query identification (EQI) as a retrieval task by reversing the role of queries and documents in two classes of search systems.
We derive an evaluation metric to measure the quality of a ranking of exposing queries, as well as conducting an empirical analysis focusing on various practical aspects of approximate EQI.
arXiv Detail & Related papers (2021-10-14T20:19:27Z) - Beyond Self-attention: External Attention using Two Linear Layers for
Visual Tasks [34.32609892928909]
We propose a novel attention mechanism which we call external attention, based on two external, small, learnable, and shared memories.
Our method provides comparable or superior performance to the self-attention mechanism and some of its variants, with much lower computational and memory costs.
arXiv Detail & Related papers (2021-05-05T22:29:52Z) - Improving Attention Mechanism with Query-Value Interaction [92.67156911466397]
We propose a query-value interaction function which can learn query-aware attention values.
Our approach can consistently improve the performance of many attention-based models.
arXiv Detail & Related papers (2020-10-08T05:12:52Z) - Learning Hard Retrieval Decoder Attention for Transformers [69.40942736249397]
Transformer translation model is based on the multi-head attention mechanism, which can be parallelized easily.
We show that our hard retrieval attention mechanism is 1.43 times faster in decoding.
arXiv Detail & Related papers (2020-09-30T13:18:57Z) - Tasks Integrated Networks: Joint Detection and Retrieval for Image
Search [99.49021025124405]
In many real-world searching scenarios (e.g., video surveillance), the objects are seldom accurately detected or annotated.
We first introduce an end-to-end Integrated Net (I-Net), which has three merits.
We further propose an improved I-Net, called DC-I-Net, which makes two new contributions.
arXiv Detail & Related papers (2020-09-03T03:57:50Z) - Self-Segregating and Coordinated-Segregating Transformer for Focused
Deep Multi-Modular Network for Visual Question Answering [9.89901717499058]
We define segregating strategies that can prioritize the contents for the applications for enhancement of performance.
We defined two strategies: Self-Segregating Transformer (SST) and Coordinated-Segregating Transformer (CST)
This work can easily be used in many other applications that involve repetition and multiple frames of features.
arXiv Detail & Related papers (2020-06-25T09:17:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.