When do Generative Query and Document Expansions Fail? A Comprehensive
Study Across Methods, Retrievers, and Datasets
- URL: http://arxiv.org/abs/2309.08541v2
- Date: Mon, 26 Feb 2024 20:57:33 GMT
- Title: When do Generative Query and Document Expansions Fail? A Comprehensive
Study Across Methods, Retrievers, and Datasets
- Authors: Orion Weller, Kyle Lo, David Wadden, Dawn Lawrie, Benjamin Van Durme,
Arman Cohan, Luca Soldaini
- Abstract summary: We conduct the first comprehensive analysis of LM-based expansion.
We find that there exists a strong negative correlation between retriever performance and gains from expansion.
Our results suggest the following recipe: use expansions for weaker models or when the target dataset significantly differs from training corpus in format.
- Score: 69.28733312110566
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Using large language models (LMs) for query or document expansion can improve
generalization in information retrieval. However, it is unknown whether these
techniques are universally beneficial or only effective in specific settings,
such as for particular retrieval models, dataset domains, or query types. To
answer this, we conduct the first comprehensive analysis of LM-based expansion.
We find that there exists a strong negative correlation between retriever
performance and gains from expansion: expansion improves scores for weaker
models, but generally harms stronger models. We show this trend holds across a
set of eleven expansion techniques, twelve datasets with diverse distribution
shifts, and twenty-four retrieval models. Through qualitative error analysis,
we hypothesize that although expansions provide extra information (potentially
improving recall), they add additional noise that makes it difficult to discern
between the top relevant documents (thus introducing false positives). Our
results suggest the following recipe: use expansions for weaker models or when
the target dataset significantly differs from training corpus in format;
otherwise, avoid expansions to keep the relevance signal clear.
Related papers
- Data Pruning in Generative Diffusion Models [2.0111637969968]
Generative models aim to estimate the underlying distribution of the data, so presumably they should benefit from larger datasets.
We show that eliminating redundant or noisy data in large datasets is beneficial particularly when done strategically.
arXiv Detail & Related papers (2024-11-19T14:13:25Z) - Less is More: Making Smaller Language Models Competent Subgraph Retrievers for Multi-hop KGQA [51.3033125256716]
We model the subgraph retrieval task as a conditional generation task handled by small language models.
Our base generative subgraph retrieval model, consisting of only 220M parameters, competitive retrieval performance compared to state-of-the-art models.
Our largest 3B model, when plugged with an LLM reader, sets new SOTA end-to-end performance on both the WebQSP and CWQ benchmarks.
arXiv Detail & Related papers (2024-10-08T15:22:36Z) - $\textbf{Only-IF}$:Revealing the Decisive Effect of Instruction Diversity on Generalization [1.6958018695660049]
We show that $textbfonly emerges$ when training data is diversified enough across semantic domains.
We extend our analysis to real-world scenarios, including fine-tuning of $textit$textbfspecialist$$ and $textit$textbfgeneralist$$ models.
arXiv Detail & Related papers (2024-10-07T03:15:11Z) - LLM-Augmented Retrieval: Enhancing Retrieval Models Through Language Models and Doc-Level Embedding [2.0257616108612373]
This paper introduces a model-agnostic doc-level embedding framework through large language model augmentation.
We have been able to significantly improve the effectiveness of widely-used retriever models.
arXiv Detail & Related papers (2024-04-08T19:29:07Z) - Distribution-Aware Data Expansion with Diffusion Models [55.979857976023695]
We propose DistDiff, a training-free data expansion framework based on the distribution-aware diffusion model.
DistDiff consistently enhances accuracy across a diverse range of datasets compared to models trained solely on original data.
arXiv Detail & Related papers (2024-03-11T14:07:53Z) - Corrective Retrieval Augmented Generation [36.04062963574603]
Retrieval-augmented generation (RAG) relies heavily on relevance of retrieved documents, raising concerns about how the model behaves if retrieval goes wrong.
We propose the Corrective Retrieval Augmented Generation (CRAG) to improve the robustness of generation.
CRAG is plug-and-play and can be seamlessly coupled with various RAG-based approaches.
arXiv Detail & Related papers (2024-01-29T04:36:39Z) - Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines [83.65380507372483]
Large pre-trained models can dramatically reduce the amount of task-specific data required to solve a problem, but they often fail to capture domain-specific nuances out of the box.
This paper shows how to leverage recent advances in NLP and multi-modal learning to augment a pre-trained model with search engine retrieval.
arXiv Detail & Related papers (2023-11-29T05:33:28Z) - Query2doc: Query Expansion with Large Language Models [69.9707552694766]
The proposed method first generates pseudo- documents by few-shot prompting large language models (LLMs)
query2doc boosts the performance of BM25 by 3% to 15% on ad-hoc IR datasets.
Our method also benefits state-of-the-art dense retrievers in terms of both in-domain and out-of-domain results.
arXiv Detail & Related papers (2023-03-14T07:27:30Z) - Characterizing Attribution and Fluency Tradeoffs for Retrieval-Augmented
Large Language Models [6.425088990363101]
We examine the relationship between fluency and attribution in Large Language Models prompted with retrieved evidence.
We show that larger models tend to do much better in both fluency and attribution.
We propose a recipe that could allow smaller models to both close the gap with larger models and preserve the benefits of top-k retrieval.
arXiv Detail & Related papers (2023-02-11T02:43:34Z) - Query Expansion Using Contextual Clue Sampling with Language Models [69.51976926838232]
We propose a combination of an effective filtering strategy and fusion of the retrieved documents based on the generation probability of each context.
Our lexical matching based approach achieves a similar top-5/top-20 retrieval accuracy and higher top-100 accuracy compared with the well-established dense retrieval model DPR.
For end-to-end QA, the reader model also benefits from our method and achieves the highest Exact-Match score against several competitive baselines.
arXiv Detail & Related papers (2022-10-13T15:18:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.