Query2doc: Query Expansion with Large Language Models
- URL: http://arxiv.org/abs/2303.07678v2
- Date: Wed, 11 Oct 2023 08:34:42 GMT
- Title: Query2doc: Query Expansion with Large Language Models
- Authors: Liang Wang, Nan Yang, Furu Wei
- Abstract summary: The proposed method first generates pseudo- documents by few-shot prompting large language models (LLMs)
query2doc boosts the performance of BM25 by 3% to 15% on ad-hoc IR datasets.
Our method also benefits state-of-the-art dense retrievers in terms of both in-domain and out-of-domain results.
- Score: 69.9707552694766
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper introduces a simple yet effective query expansion approach,
denoted as query2doc, to improve both sparse and dense retrieval systems. The
proposed method first generates pseudo-documents by few-shot prompting large
language models (LLMs), and then expands the query with generated
pseudo-documents. LLMs are trained on web-scale text corpora and are adept at
knowledge memorization. The pseudo-documents from LLMs often contain highly
relevant information that can aid in query disambiguation and guide the
retrievers. Experimental results demonstrate that query2doc boosts the
performance of BM25 by 3% to 15% on ad-hoc IR datasets, such as MS-MARCO and
TREC DL, without any model fine-tuning. Furthermore, our method also benefits
state-of-the-art dense retrievers in terms of both in-domain and out-of-domain
results.
Related papers
- Knowledge-Aware Query Expansion with Large Language Models for Textual and Relational Retrieval [49.42043077545341]
We propose a knowledge-aware query expansion framework, augmenting LLMs with structured document relations from knowledge graph (KG)
We leverage document texts as rich KG node representations and use document-based relation filtering for our Knowledge-Aware Retrieval (KAR)
arXiv Detail & Related papers (2024-10-17T17:03:23Z) - PromptReps: Prompting Large Language Models to Generate Dense and Sparse Representations for Zero-Shot Document Retrieval [76.50690734636477]
We propose PromptReps, which combines the advantages of both categories: no need for training and the ability to retrieve from the whole corpus.
The retrieval system harnesses both dense text embedding and sparse bag-of-words representations.
arXiv Detail & Related papers (2024-04-29T04:51:30Z) - Harnessing Multi-Role Capabilities of Large Language Models for
Open-Domain Question Answering [40.2758450304531]
Open-domain question answering (ODQA) has emerged as a pivotal research spotlight in information systems.
We propose a framework that formulates the ODQA process into three basic steps: query expansion, document selection, and answer generation.
We introduce a novel prompt optimization algorithm to refine role-playing prompts and steer LLMs to produce higher-quality evidence and answers.
arXiv Detail & Related papers (2024-03-08T11:09:13Z) - MILL: Mutual Verification with Large Language Models for Zero-Shot Query Expansion [39.24969189479343]
We propose a novel zero-shot query expansion framework utilizing large language models (LLMs) for mutual verification.
Our proposed method is fully zero-shot, and extensive experiments on three public benchmark datasets are conducted to demonstrate its effectiveness.
arXiv Detail & Related papers (2023-10-29T16:04:10Z) - Large Language Models are Built-in Autoregressive Search Engines [19.928494069013485]
Large language models (LLMs) can follow human instructions to directly generate URLs for document retrieval.
LLMs can generate Web URLs where nearly 90% of the corresponding documents contain correct answers to open-domain questions.
arXiv Detail & Related papers (2023-05-16T17:04:48Z) - Zero-Shot Listwise Document Reranking with a Large Language Model [58.64141622176841]
We propose Listwise Reranker with a Large Language Model (LRL), which achieves strong reranking effectiveness without using any task-specific training data.
Experiments on three TREC web search datasets demonstrate that LRL not only outperforms zero-shot pointwise methods when reranking first-stage retrieval results, but can also act as a final-stage reranker.
arXiv Detail & Related papers (2023-05-03T14:45:34Z) - Large Language Models are Strong Zero-Shot Retriever [89.16756291653371]
We propose a simple method that applies a large language model (LLM) to large-scale retrieval in zero-shot scenarios.
Our method, the Language language model as Retriever (LameR), is built upon no other neural models but an LLM.
arXiv Detail & Related papers (2023-04-27T14:45:55Z) - UnifieR: A Unified Retriever for Large-Scale Retrieval [84.61239936314597]
Large-scale retrieval is to recall relevant documents from a huge collection given a query.
Recent retrieval methods based on pre-trained language models (PLM) can be coarsely categorized into either dense-vector or lexicon-based paradigms.
We propose a new learning framework, UnifieR which unifies dense-vector and lexicon-based retrieval in one model with a dual-representing capability.
arXiv Detail & Related papers (2022-05-23T11:01:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.