In Defense of Cross-Encoders for Zero-Shot Retrieval
- URL: http://arxiv.org/abs/2212.06121v1
- Date: Mon, 12 Dec 2022 18:50:03 GMT
- Title: In Defense of Cross-Encoders for Zero-Shot Retrieval
- Authors: Guilherme Rosa and Luiz Bonifacio and Vitor Jeronymo and Hugo Abonizio
and Marzieh Fadaee and Roberto Lotufo and Rodrigo Nogueira
- Abstract summary: Bi-encoders and cross-encoders are widely used in many state-of-the-art retrieval pipelines.
We find that the number of parameters and early query-document interactions of cross-encoders play a significant role in the generalization ability of retrieval models.
- Score: 4.712097135437801
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bi-encoders and cross-encoders are widely used in many state-of-the-art
retrieval pipelines. In this work we study the generalization ability of these
two types of architectures on a wide range of parameter count on both in-domain
and out-of-domain scenarios. We find that the number of parameters and early
query-document interactions of cross-encoders play a significant role in the
generalization ability of retrieval models. Our experiments show that
increasing model size results in marginal gains on in-domain test sets, but
much larger gains in new domains never seen during fine-tuning. Furthermore, we
show that cross-encoders largely outperform bi-encoders of similar size in
several tasks. In the BEIR benchmark, our largest cross-encoder surpasses a
state-of-the-art bi-encoder by more than 4 average points. Finally, we show
that using bi-encoders as first-stage retrievers provides no gains in
comparison to a simpler retriever such as BM25 on out-of-domain tasks. The code
is available at
https://github.com/guilhermemr04/scaling-zero-shot-retrieval.git
Related papers
- You Only Cache Once: Decoder-Decoder Architectures for Language Models [132.4064488592704]
We introduce a decoder-decoder architecture, YOCO, for large language models.
YOCO only caches key-value pairs once.
The overall model behaves like a decoder-only Transformer, although YOCO only caches once.
arXiv Detail & Related papers (2024-05-08T17:57:39Z) - Task-Aware Specialization for Efficient and Robust Dense Retrieval for
Open-Domain Question Answering [85.08146789409354]
We propose a new architecture, Task-awaredomain for dense Retrieval (TASER)
TASER enables parameter sharing by interleaving shared and specialized blocks in a single encoder.
Our experiments show that TASER can achieve superior accuracy, surpassing BM25, while using about 60% of the parameters as bi-encoder dense retrievers.
arXiv Detail & Related papers (2022-10-11T05:33:25Z) - ED2LM: Encoder-Decoder to Language Model for Faster Document Re-ranking
Inference [70.36083572306839]
This paper proposes a new training and inference paradigm for re-ranking.
We finetune a pretrained encoder-decoder model using in the form of document to query generation.
We show that this encoder-decoder architecture can be decomposed into a decoder-only language model during inference.
arXiv Detail & Related papers (2022-04-25T06:26:29Z) - LoopITR: Combining Dual and Cross Encoder Architectures for Image-Text
Retrieval [117.15862403330121]
We propose LoopITR, which combines dual encoders and cross encoders in the same network for joint learning.
Specifically, we let the dual encoder provide hard negatives to the cross encoder, and use the more discriminative cross encoder to distill its predictions back to the dual encoder.
arXiv Detail & Related papers (2022-03-10T16:41:12Z) - Large Dual Encoders Are Generalizable Retrievers [26.42937314291077]
We show that scaling up the model size brings significant improvement on a variety of retrieval tasks.
Our dual encoders, textbfGeneralizable textbfT5-based dense textbfRetrievers (GTR), outperform %ColBERTcitekhattab2020colbert and existing sparse and dense retrievers.
arXiv Detail & Related papers (2021-12-15T05:33:27Z) - Trans-Encoder: Unsupervised sentence-pair modelling through self- and
mutual-distillations [22.40667024030858]
Bi-encoders produce fixed-dimensional sentence representations and are computationally efficient.
Cross-encoders can leverage their attention heads to exploit inter-sentence interactions for better performance.
Trans-Encoder combines the two learning paradigms into an iterative joint framework to simultaneously learn enhanced bi- and cross-encoders.
arXiv Detail & Related papers (2021-09-27T14:06:47Z) - Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective
with Transformers [149.78470371525754]
We treat semantic segmentation as a sequence-to-sequence prediction task. Specifically, we deploy a pure transformer to encode an image as a sequence of patches.
With the global context modeled in every layer of the transformer, this encoder can be combined with a simple decoder to provide a powerful segmentation model, termed SEgmentation TRansformer (SETR)
SETR achieves new state of the art on ADE20K (50.28% mIoU), Pascal Context (55.83% mIoU) and competitive results on Cityscapes.
arXiv Detail & Related papers (2020-12-31T18:55:57Z) - Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for
Pairwise Sentence Scoring Tasks [59.13635174016506]
We present a simple yet efficient data augmentation strategy called Augmented SBERT.
We use the cross-encoder to label a larger set of input pairs to augment the training data for the bi-encoder.
We show that, in this process, selecting the sentence pairs is non-trivial and crucial for the success of the method.
arXiv Detail & Related papers (2020-10-16T08:43:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.