When do Generative Query and Document Expansions Fail? A Comprehensive
Study Across Methods, Retrievers, and Datasets
- URL: http://arxiv.org/abs/2309.08541v2
- Date: Mon, 26 Feb 2024 20:57:33 GMT
- Title: When do Generative Query and Document Expansions Fail? A Comprehensive
Study Across Methods, Retrievers, and Datasets
- Authors: Orion Weller, Kyle Lo, David Wadden, Dawn Lawrie, Benjamin Van Durme,
Arman Cohan, Luca Soldaini
- Abstract summary: We conduct the first comprehensive analysis of LM-based expansion.
We find that there exists a strong negative correlation between retriever performance and gains from expansion.
Our results suggest the following recipe: use expansions for weaker models or when the target dataset significantly differs from training corpus in format.
- Score: 69.28733312110566
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Using large language models (LMs) for query or document expansion can improve
generalization in information retrieval. However, it is unknown whether these
techniques are universally beneficial or only effective in specific settings,
such as for particular retrieval models, dataset domains, or query types. To
answer this, we conduct the first comprehensive analysis of LM-based expansion.
We find that there exists a strong negative correlation between retriever
performance and gains from expansion: expansion improves scores for weaker
models, but generally harms stronger models. We show this trend holds across a
set of eleven expansion techniques, twelve datasets with diverse distribution
shifts, and twenty-four retrieval models. Through qualitative error analysis,
we hypothesize that although expansions provide extra information (potentially
improving recall), they add additional noise that makes it difficult to discern
between the top relevant documents (thus introducing false positives). Our
results suggest the following recipe: use expansions for weaker models or when
the target dataset significantly differs from training corpus in format;
otherwise, avoid expansions to keep the relevance signal clear.
Related papers
- LLM-Augmented Retrieval: Enhancing Retrieval Models Through Language Models and Doc-Level Embedding [2.0257616108612373]
This paper introduces a model-agnostic doc-level embedding framework through large language model augmentation.
We have been able to significantly improve the effectiveness of widely-used retriever models.
arXiv Detail & Related papers (2024-04-08T19:29:07Z) - Distribution-Aware Data Expansion with Diffusion Models [55.979857976023695]
We propose DistDiff, a training-free data expansion framework based on the distribution-aware diffusion model.
DistDiff consistently enhances accuracy across a diverse range of datasets compared to models trained solely on original data.
arXiv Detail & Related papers (2024-03-11T14:07:53Z) - Corrective Retrieval Augmented Generation [39.371798735872865]
Retrieval-augmented generation (RAG) relies heavily on relevance of retrieved documents, raising concerns about how the model behaves if retrieval goes wrong.
We propose the Corrective Retrieval Augmented Generation (CRAG) to improve the robustness of generation.
CRAG is plug-and-play and can be seamlessly coupled with various RAG-based approaches.
arXiv Detail & Related papers (2024-01-29T04:36:39Z) - Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines [83.65380507372483]
Large pre-trained models can dramatically reduce the amount of task-specific data required to solve a problem, but they often fail to capture domain-specific nuances out of the box.
This paper shows how to leverage recent advances in NLP and multi-modal learning to augment a pre-trained model with search engine retrieval.
arXiv Detail & Related papers (2023-11-29T05:33:28Z) - Query2doc: Query Expansion with Large Language Models [69.9707552694766]
The proposed method first generates pseudo- documents by few-shot prompting large language models (LLMs)
query2doc boosts the performance of BM25 by 3% to 15% on ad-hoc IR datasets.
Our method also benefits state-of-the-art dense retrievers in terms of both in-domain and out-of-domain results.
arXiv Detail & Related papers (2023-03-14T07:27:30Z) - Characterizing Attribution and Fluency Tradeoffs for Retrieval-Augmented
Large Language Models [6.425088990363101]
We examine the relationship between fluency and attribution in Large Language Models prompted with retrieved evidence.
We show that larger models tend to do much better in both fluency and attribution.
We propose a recipe that could allow smaller models to both close the gap with larger models and preserve the benefits of top-k retrieval.
arXiv Detail & Related papers (2023-02-11T02:43:34Z) - Unsupervised Dense Retrieval Deserves Better Positive Pairs: Scalable
Augmentation with Query Extraction and Generation [27.391814046104646]
We explore two categories of methods for creating pseudo query-document pairs, named query extraction (QExt) and transferred query generation (TQGen)
QExt extracts pseudo queries by document structures or selecting salient random spans, and TQGen utilizes generation models trained for other NLP tasks.
Experiments show that dense retrievers trained with individual augmentation methods can perform comparably well with multiple strong baselines.
arXiv Detail & Related papers (2022-12-17T10:43:25Z) - Query Expansion Using Contextual Clue Sampling with Language Models [69.51976926838232]
We propose a combination of an effective filtering strategy and fusion of the retrieved documents based on the generation probability of each context.
Our lexical matching based approach achieves a similar top-5/top-20 retrieval accuracy and higher top-100 accuracy compared with the well-established dense retrieval model DPR.
For end-to-end QA, the reader model also benefits from our method and achieves the highest Exact-Match score against several competitive baselines.
arXiv Detail & Related papers (2022-10-13T15:18:04Z) - Generative Data Augmentation for Commonsense Reasoning [75.26876609249197]
G-DAUGC is a novel generative data augmentation method that aims to achieve more accurate and robust learning in the low-resource setting.
G-DAUGC consistently outperforms existing data augmentation methods based on back-translation.
Our analysis demonstrates that G-DAUGC produces a diverse set of fluent training examples, and that its selection and training approaches are important for performance.
arXiv Detail & Related papers (2020-04-24T06:12:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.