Related papers: Compositional Attention: Disentangling Search and Retrieval

Compositional Attention: Disentangling Search and Retrieval

URL: http://arxiv.org/abs/2110.09419v1
Date: Mon, 18 Oct 2021 15:47:38 GMT
Title: Compositional Attention: Disentangling Search and Retrieval
Authors: Sarthak Mittal, Sharath Chandra Raparthy, Irina Rish, Yoshua Bengio and Guillaume Lajoie
Abstract summary: Multi-head, key-value attention is the backbone of the Transformer model and its variants. Standard attention heads learn a rigid mapping between search and retrieval. We propose a novel attention mechanism, called Compositional Attention, that replaces the standard head structure.
Score: 66.7108739597771
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-head, key-value attention is the backbone of the widely successful Transformer model and its variants. This attention mechanism uses multiple parallel key-value attention blocks (called heads), each performing two fundamental computations: (1) search - selection of a relevant entity from a set via query-key interactions, and (2) retrieval - extraction of relevant features from the selected entity via a value matrix. Importantly, standard attention heads learn a rigid mapping between search and retrieval. In this work, we first highlight how this static nature of the pairing can potentially: (a) lead to learning of redundant parameters in certain tasks, and (b) hinder generalization. To alleviate this problem, we propose a novel attention mechanism, called Compositional Attention, that replaces the standard head structure. The proposed mechanism disentangles search and retrieval and composes them in a dynamic, flexible and context-dependent manner through an additional soft competition stage between the query-key combination and value pairing. Through a series of numerical experiments, we show that it outperforms standard multi-head attention on a variety of tasks, including some out-of-distribution settings. Through our qualitative analysis, we demonstrate that Compositional Attention leads to dynamic specialization based on the type of retrieval needed. Our proposed mechanism generalizes multi-head attention, allows independent scaling of search and retrieval, and can easily be implemented in lieu of standard attention heads in any network architecture.

Related papers

Reinforcing Compositional Retrieval: Retrieving Step-by-Step for Composing Informative Contexts [67.67746334493302]
Large Language Models (LLMs) have demonstrated remarkable capabilities across numerous tasks, yet they often rely on external context to handle complex tasks. We propose a tri-encoder sequential retriever that models this process as a Markov Decision Process (MDP) We show that our method consistently and significantly outperforms baselines, underscoring the importance of explicitly modeling inter-example dependencies.
arXiv Detail & Related papers (2025-04-15T17:35:56Z)
GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning [51.677086019209554]
We propose a Generalized Structural Sparse to capture powerful relationships across modalities for pair-wise similarity learning. The distance metric delicately encapsulates two formats of diagonal and block-diagonal terms. Experiments on cross-modal and two extra uni-modal retrieval tasks have validated its superiority and flexibility.
arXiv Detail & Related papers (2024-10-20T03:45:50Z)
RQFormer: Rotated Query Transformer for End-to-End Oriented Object Detection [26.37802649901314]
Oriented object detection presents a challenging task due to the presence of object instances with multiple orientations, varying scales, and dense distributions. We propose an end-to-end oriented detector called the Rotated Query Transformer, which integrates two key technologies. Experiments conducted on four remote sensing datasets and one scene text dataset demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2023-11-29T13:43:17Z)
Aspect-Oriented Summarization through Query-Focused Extraction [23.62412515574206]
Real users' needs often fall more closely into aspects, broad topics in a dataset the user is interested in rather than specific queries. We benchmark extractive query-focused training schemes, and propose a contrastive augmentation approach to train the model. We evaluate on two aspect-oriented datasets and find this approach yields focused summaries, better than those from a generic summarization system.
arXiv Detail & Related papers (2021-10-15T18:06:21Z)
Exposing Query Identification for Search Transparency [69.06545074617685]
We explore the feasibility of approximate exposing query identification (EQI) as a retrieval task by reversing the role of queries and documents in two classes of search systems. We derive an evaluation metric to measure the quality of a ranking of exposing queries, as well as conducting an empirical analysis focusing on various practical aspects of approximate EQI.
arXiv Detail & Related papers (2021-10-14T20:19:27Z)
Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks [34.32609892928909]
We propose a novel attention mechanism which we call external attention, based on two external, small, learnable, and shared memories. Our method provides comparable or superior performance to the self-attention mechanism and some of its variants, with much lower computational and memory costs.
arXiv Detail & Related papers (2021-05-05T22:29:52Z)
Improving Attention Mechanism with Query-Value Interaction [92.67156911466397]
We propose a query-value interaction function which can learn query-aware attention values. Our approach can consistently improve the performance of many attention-based models.
arXiv Detail & Related papers (2020-10-08T05:12:52Z)
Learning Hard Retrieval Decoder Attention for Transformers [69.40942736249397]
Transformer translation model is based on the multi-head attention mechanism, which can be parallelized easily. We show that our hard retrieval attention mechanism is 1.43 times faster in decoding.
arXiv Detail & Related papers (2020-09-30T13:18:57Z)
Tasks Integrated Networks: Joint Detection and Retrieval for Image Search [99.49021025124405]
In many real-world searching scenarios (e.g., video surveillance), the objects are seldom accurately detected or annotated. We first introduce an end-to-end Integrated Net (I-Net), which has three merits. We further propose an improved I-Net, called DC-I-Net, which makes two new contributions.
arXiv Detail & Related papers (2020-09-03T03:57:50Z)
Self-Segregating and Coordinated-Segregating Transformer for Focused Deep Multi-Modular Network for Visual Question Answering [9.89901717499058]
We define segregating strategies that can prioritize the contents for the applications for enhancement of performance. We defined two strategies: Self-Segregating Transformer (SST) and Coordinated-Segregating Transformer (CST) This work can easily be used in many other applications that involve repetition and multiple frames of features.
arXiv Detail & Related papers (2020-06-25T09:17:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.