Related papers: In Defense of Cross-Encoders for Zero-Shot Retrieval

In Defense of Cross-Encoders for Zero-Shot Retrieval

URL: http://arxiv.org/abs/2212.06121v1
Date: Mon, 12 Dec 2022 18:50:03 GMT
Title: In Defense of Cross-Encoders for Zero-Shot Retrieval
Authors: Guilherme Rosa and Luiz Bonifacio and Vitor Jeronymo and Hugo Abonizio and Marzieh Fadaee and Roberto Lotufo and Rodrigo Nogueira
Abstract summary: Bi-encoders and cross-encoders are widely used in many state-of-the-art retrieval pipelines. We find that the number of parameters and early query-document interactions of cross-encoders play a significant role in the generalization ability of retrieval models.
Score: 4.712097135437801
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Bi-encoders and cross-encoders are widely used in many state-of-the-art retrieval pipelines. In this work we study the generalization ability of these two types of architectures on a wide range of parameter count on both in-domain and out-of-domain scenarios. We find that the number of parameters and early query-document interactions of cross-encoders play a significant role in the generalization ability of retrieval models. Our experiments show that increasing model size results in marginal gains on in-domain test sets, but much larger gains in new domains never seen during fine-tuning. Furthermore, we show that cross-encoders largely outperform bi-encoders of similar size in several tasks. In the BEIR benchmark, our largest cross-encoder surpasses a state-of-the-art bi-encoder by more than 4 average points. Finally, we show that using bi-encoders as first-stage retrievers provides no gains in comparison to a simpler retriever such as BM25 on out-of-domain tasks. The code is available at https://github.com/guilhermemr04/scaling-zero-shot-retrieval.git

Related papers

You Only Cache Once: Decoder-Decoder Architectures for Language Models [132.4064488592704]
We introduce a decoder-decoder architecture, YOCO, for large language models. YOCO only caches key-value pairs once. The overall model behaves like a decoder-only Transformer, although YOCO only caches once.
arXiv Detail & Related papers (2024-05-08T17:57:39Z)
Improving Code Search with Hard Negative Sampling Based on Fine-tuning [15.341959871682981]
We introduce a cross-encoder architecture for code search that jointly encodes the concatenation of query and code. We also introduce a Retriever-Ranker (RR) framework that cascades the dual-encoder and cross-encoder to promote the efficiency of evaluation and online serving.
arXiv Detail & Related papers (2023-05-08T07:04:28Z)
Task-Aware Specialization for Efficient and Robust Dense Retrieval for Open-Domain Question Answering [85.08146789409354]
We propose a new architecture, Task-awaredomain for dense Retrieval (TASER) TASER enables parameter sharing by interleaving shared and specialized blocks in a single encoder. Our experiments show that TASER can achieve superior accuracy, surpassing BM25, while using about 60% of the parameters as bi-encoder dense retrievers.
arXiv Detail & Related papers (2022-10-11T05:33:25Z)
ED2LM: Encoder-Decoder to Language Model for Faster Document Re-ranking Inference [70.36083572306839]
This paper proposes a new training and inference paradigm for re-ranking. We finetune a pretrained encoder-decoder model using in the form of document to query generation. We show that this encoder-decoder architecture can be decomposed into a decoder-only language model during inference.
arXiv Detail & Related papers (2022-04-25T06:26:29Z)
LoopITR: Combining Dual and Cross Encoder Architectures for Image-Text Retrieval [117.15862403330121]
We propose LoopITR, which combines dual encoders and cross encoders in the same network for joint learning. Specifically, we let the dual encoder provide hard negatives to the cross encoder, and use the more discriminative cross encoder to distill its predictions back to the dual encoder.
arXiv Detail & Related papers (2022-03-10T16:41:12Z)
Large Dual Encoders Are Generalizable Retrievers [26.42937314291077]
We show that scaling up the model size brings significant improvement on a variety of retrieval tasks. Our dual encoders, textbfGeneralizable textbfT5-based dense textbfRetrievers (GTR), outperform %ColBERTcitekhattab2020colbert and existing sparse and dense retrievers.
arXiv Detail & Related papers (2021-12-15T05:33:27Z)
Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations [22.40667024030858]
Bi-encoders produce fixed-dimensional sentence representations and are computationally efficient. Cross-encoders can leverage their attention heads to exploit inter-sentence interactions for better performance. Trans-Encoder combines the two learning paradigms into an iterative joint framework to simultaneously learn enhanced bi- and cross-encoders.
arXiv Detail & Related papers (2021-09-27T14:06:47Z)
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers [149.78470371525754]
We treat semantic segmentation as a sequence-to-sequence prediction task. Specifically, we deploy a pure transformer to encode an image as a sequence of patches. With the global context modeled in every layer of the transformer, this encoder can be combined with a simple decoder to provide a powerful segmentation model, termed SEgmentation TRansformer (SETR) SETR achieves new state of the art on ADE20K (50.28% mIoU), Pascal Context (55.83% mIoU) and competitive results on Cityscapes.
arXiv Detail & Related papers (2020-12-31T18:55:57Z)
Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks [59.13635174016506]
We present a simple yet efficient data augmentation strategy called Augmented SBERT. We use the cross-encoder to label a larger set of input pairs to augment the training data for the bi-encoder. We show that, in this process, selecting the sentence pairs is non-trivial and crucial for the success of the method.
arXiv Detail & Related papers (2020-10-16T08:43:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.