Large Dual Encoders Are Generalizable Retrievers
- URL: http://arxiv.org/abs/2112.07899v1
- Date: Wed, 15 Dec 2021 05:33:27 GMT
- Title: Large Dual Encoders Are Generalizable Retrievers
- Authors: Jianmo Ni, Chen Qu, Jing Lu, Zhuyun Dai, Gustavo Hern\'andez \'Abrego,
Ji Ma, Vincent Y. Zhao, Yi Luan, Keith B. Hall, Ming-Wei Chang, Yinfei Yang
- Abstract summary: We show that scaling up the model size brings significant improvement on a variety of retrieval tasks.
Our dual encoders, textbfGeneralizable textbfT5-based dense textbfRetrievers (GTR), outperform %ColBERTcitekhattab2020colbert and existing sparse and dense retrievers.
- Score: 26.42937314291077
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It has been shown that dual encoders trained on one domain often fail to
generalize to other domains for retrieval tasks. One widespread belief is that
the bottleneck layer of a dual encoder, where the final score is simply a
dot-product between a query vector and a passage vector, is too limited to make
dual encoders an effective retrieval model for out-of-domain generalization. In
this paper, we challenge this belief by scaling up the size of the dual encoder
model {\em while keeping the bottleneck embedding size fixed.} With multi-stage
training, surprisingly, scaling up the model size brings significant
improvement on a variety of retrieval tasks, especially for out-of-domain
generalization. Experimental results show that our dual encoders,
\textbf{G}eneralizable \textbf{T}5-based dense \textbf{R}etrievers (GTR),
outperform %ColBERT~\cite{khattab2020colbert} and existing sparse and dense
retrievers on the BEIR dataset~\cite{thakur2021beir} significantly. Most
surprisingly, our ablation study finds that GTR is very data efficient, as it
only needs 10\% of MS Marco supervised data to achieve the best out-of-domain
performance. All the GTR models are released at
https://tfhub.dev/google/collections/gtr/1.
Related papers
- You Only Cache Once: Decoder-Decoder Architectures for Language Models [132.4064488592704]
We introduce a decoder-decoder architecture, YOCO, for large language models.
YOCO only caches key-value pairs once.
The overall model behaves like a decoder-only Transformer, although YOCO only caches once.
arXiv Detail & Related papers (2024-05-08T17:57:39Z) - Improving Dual-Encoder Training through Dynamic Indexes for Negative
Mining [61.09807522366773]
We introduce an algorithm that approximates the softmax with provable bounds and that dynamically maintains the tree.
In our study on datasets with over twenty million targets, our approach cuts error by half in relation to oracle brute-force negative mining.
arXiv Detail & Related papers (2023-03-27T15:18:32Z) - In Defense of Cross-Encoders for Zero-Shot Retrieval [4.712097135437801]
Bi-encoders and cross-encoders are widely used in many state-of-the-art retrieval pipelines.
We find that the number of parameters and early query-document interactions of cross-encoders play a significant role in the generalization ability of retrieval models.
arXiv Detail & Related papers (2022-12-12T18:50:03Z) - Salient Object Detection via Dynamic Scale Routing [62.26677215668959]
This paper introduces the "dynamic" scale routing (as a brand-new idea) in this paper.
It will result in a generic plug-in that could directly fit the existing feature backbone.
We provide a self-adaptive bidirectional decoder design to accommodate the DPConv-based encoder best.
arXiv Detail & Related papers (2022-10-25T08:01:27Z) - Efficient Nearest Neighbor Search for Cross-Encoder Models using Matrix
Factorization [60.91600465922932]
We present an approach that avoids the use of a dual-encoder for retrieval, relying solely on the cross-encoder.
Our approach provides test-time recall-vs-computational cost trade-offs superior to the current widely-used methods.
arXiv Detail & Related papers (2022-10-23T00:32:04Z) - Task-Aware Specialization for Efficient and Robust Dense Retrieval for
Open-Domain Question Answering [85.08146789409354]
We propose a new architecture, Task-awaredomain for dense Retrieval (TASER)
TASER enables parameter sharing by interleaving shared and specialized blocks in a single encoder.
Our experiments show that TASER can achieve superior accuracy, surpassing BM25, while using about 60% of the parameters as bi-encoder dense retrievers.
arXiv Detail & Related papers (2022-10-11T05:33:25Z) - Revisiting Code Search in a Two-Stage Paradigm [67.02322603435628]
TOSS is a two-stage fusion code search framework.
It first uses IR-based and bi-encoder models to efficiently recall a small number of top-k code candidates.
It then uses fine-grained cross-encoders for finer ranking.
arXiv Detail & Related papers (2022-08-24T02:34:27Z) - Exploring and Exploiting Multi-Granularity Representations for Machine
Reading Comprehension [13.191437539419681]
We propose a novel approach called Adaptive Bidirectional Attention-Capsule Network (ABA-Net)
ABA-Net adaptively exploits the source representations of different levels to the predictor.
We set the new state-of-the-art performance on the SQuAD 1.0 dataset.
arXiv Detail & Related papers (2022-08-18T10:14:32Z) - On Pursuit of Designing Multi-modal Transformer for Video Grounding [35.25323276744999]
Video grounding aims to localize the temporal segment corresponding to a sentence query from an untrimmed video.
We propose a novel end-to-end multi-modal Transformer model, dubbed as bfGTR. Specifically, GTR has two encoders for video and language encoding, and a cross-modal decoder for grounding prediction.
All three typical GTR variants achieve record-breaking performance on all datasets and metrics, with several times faster inference speed.
arXiv Detail & Related papers (2021-09-13T16:01:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.