A Thorough Examination on Zero-shot Dense Retrieval
- URL: http://arxiv.org/abs/2204.12755v2
- Date: Sun, 23 Apr 2023 17:11:13 GMT
- Title: A Thorough Examination on Zero-shot Dense Retrieval
- Authors: Ruiyang Ren, Yingqi Qu, Jing Liu, Wayne Xin Zhao, Qifei Wu, Yuchen
Ding, Hua Wu, Haifeng Wang, Ji-Rong Wen
- Abstract summary: We present the first thorough examination of the zero-shot capability of dense retrieval (DR) models.
We discuss the effect of several key factors related to source training set, analyze the potential bias from the target dataset, and review and compare existing zero-shot DR models.
- Score: 84.70868940598143
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent years have witnessed the significant advance in dense retrieval (DR)
based on powerful pre-trained language models (PLM). DR models have achieved
excellent performance in several benchmark datasets, while they are shown to be
not as competitive as traditional sparse retrieval models (e.g., BM25) in a
zero-shot retrieval setting. However, in the related literature, there still
lacks a detailed and comprehensive study on zero-shot retrieval. In this paper,
we present the first thorough examination of the zero-shot capability of DR
models. We aim to identify the key factors and analyze how they affect
zero-shot retrieval performance. In particular, we discuss the effect of
several key factors related to source training set, analyze the potential bias
from the target dataset, and review and compare existing zero-shot DR models.
Our findings provide important evidence to better understand and develop
zero-shot DR models.
Related papers
- RDBE: Reasoning Distillation-Based Evaluation Enhances Automatic Essay Scoring [0.0]
Reasoning Distillation-Based Evaluation (RDBE) integrates interpretability to elucidate the rationale behind model scores.
Our experimental results demonstrate the efficacy of RDBE across all scoring rubrics considered in the dataset.
arXiv Detail & Related papers (2024-07-03T05:49:01Z) - No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance [68.18779562801762]
multimodal models require exponentially more data to achieve linear improvements in downstream "zero-shot" performance.
Our study reveals an exponential need for training data which implies that the key to "zero-shot" generalization capabilities under large-scale training paradigms remains to be found.
arXiv Detail & Related papers (2024-04-04T17:58:02Z) - Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A Pilot Study [61.65123150513683]
multimodal foundation models, such as CLIP, produce state-of-the-art zero-shot results.
It is reported that these models close the robustness gap by matching the performance of supervised models trained on ImageNet.
We show that CLIP leads to a significant robustness drop compared to supervised ImageNet models on our benchmark.
arXiv Detail & Related papers (2024-03-15T17:33:49Z) - Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines [83.65380507372483]
Large pre-trained models can dramatically reduce the amount of task-specific data required to solve a problem, but they often fail to capture domain-specific nuances out of the box.
This paper shows how to leverage recent advances in NLP and multi-modal learning to augment a pre-trained model with search engine retrieval.
arXiv Detail & Related papers (2023-11-29T05:33:28Z) - Black-box Adversarial Attacks against Dense Retrieval Models: A
Multi-view Contrastive Learning Method [115.29382166356478]
We introduce the adversarial retrieval attack (AREA) task.
It is meant to trick DR models into retrieving a target document that is outside the initial set of candidate documents retrieved by the DR model.
We find that the promising results that have previously been reported on attacking NRMs, do not generalize to DR models.
We propose to formalize attacks on DR models as a contrastive learning problem in a multi-view representation space.
arXiv Detail & Related papers (2023-08-19T00:24:59Z) - A Comprehensive Evaluation and Analysis Study for Chinese Spelling Check [53.152011258252315]
We show that using phonetic and graphic information reasonably is effective for Chinese Spelling Check.
Models are sensitive to the error distribution of the test set, which reflects the shortcomings of models.
The commonly used benchmark, SIGHAN, can not reliably evaluate models' performance.
arXiv Detail & Related papers (2023-07-25T17:02:38Z) - Rethinking Dense Retrieval's Few-Shot Ability [24.86681340512899]
Few-shot dense retrieval aims to generalize to novel search scenarios by learning a few samples.
Current methods often resort to random sampling from supervised datasets to create "few-data" setups.
We propose a customized FewDR dataset and a unified evaluation benchmark.
arXiv Detail & Related papers (2023-04-12T13:20:16Z) - A Systematic Investigation of Commonsense Understanding in Large
Language Models [23.430757316504316]
Large language models have shown impressive performance on many natural language processing (NLP) tasks in a zero-shot setting.
We ask whether these models exhibit commonsense understanding by evaluating models against four commonsense benchmarks.
arXiv Detail & Related papers (2021-10-31T22:20:36Z) - BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information
Retrieval Models [41.45240621979654]
We introduce BEIR, a heterogeneous benchmark for information retrieval.
We study the effectiveness of nine state-of-the-art retrieval models in a zero-shot evaluation setup.
Dense-retrieval models are computationally more efficient but often underperform other approaches.
arXiv Detail & Related papers (2021-04-17T23:29:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.