MILL: Mutual Verification with Large Language Models for Zero-Shot Query Expansion
- URL: http://arxiv.org/abs/2310.19056v3
- Date: Thu, 28 Mar 2024 06:49:56 GMT
- Title: MILL: Mutual Verification with Large Language Models for Zero-Shot Query Expansion
- Authors: Pengyue Jia, Yiding Liu, Xiangyu Zhao, Xiaopeng Li, Changying Hao, Shuaiqiang Wang, Dawei Yin,
- Abstract summary: We propose a novel zero-shot query expansion framework utilizing large language models (LLMs) for mutual verification.
Our proposed method is fully zero-shot, and extensive experiments on three public benchmark datasets are conducted to demonstrate its effectiveness.
- Score: 39.24969189479343
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Query expansion, pivotal in search engines, enhances the representation of user information needs with additional terms. While existing methods expand queries using retrieved or generated contextual documents, each approach has notable limitations. Retrieval-based methods often fail to accurately capture search intent, particularly with brief or ambiguous queries. Generation-based methods, utilizing large language models (LLMs), generally lack corpus-specific knowledge and entail high fine-tuning costs. To address these gaps, we propose a novel zero-shot query expansion framework utilizing LLMs for mutual verification. Specifically, we first design a query-query-document generation method, leveraging LLMs' zero-shot reasoning ability to produce diverse sub-queries and corresponding documents. Then, a mutual verification process synergizes generated and retrieved documents for optimal expansion. Our proposed method is fully zero-shot, and extensive experiments on three public benchmark datasets are conducted to demonstrate its effectiveness over existing methods. Our code is available online at https://github.com/Applied-Machine-Learning-Lab/MILL to ease reproduction.
Related papers
- Learning More Effective Representations for Dense Retrieval through Deliberate Thinking Before Search [65.53881294642451]
Deliberate Thinking based Dense Retriever (DEBATER)
DEBATER enhances recent dense retrievers by enabling them to learn more effective document representations through a step-by-step thinking process.
Experimental results show that DEBATER significantly outperforms existing methods across several retrieval benchmarks.
arXiv Detail & Related papers (2025-02-18T15:56:34Z) - DOGR: Leveraging Document-Oriented Contrastive Learning in Generative Retrieval [10.770281363775148]
We propose a novel and general generative retrieval framework, namely Leveraging Document-Oriented Contrastive Learning in Generative Retrieval (DOGR)
It adopts a two-stage learning strategy that captures the relationship between queries and documents comprehensively through direct interactions.
Negative sampling methods and corresponding contrastive learning objectives are implemented to enhance the learning of semantic representations.
arXiv Detail & Related papers (2025-02-11T03:25:42Z) - Improving Tool Retrieval by Leveraging Large Language Models for Query Generation [16.7926347207647]
In-context learning can provide a short list of relevant tools in the prompt.
We propose leveraging Large Language Models (LLMs) to generate a retrieval query.
The generated query is embedded and used to find the most relevant tools via a nearest-neighbor search.
arXiv Detail & Related papers (2024-11-17T03:02:09Z) - Knowledge-Aware Query Expansion with Large Language Models for Textual and Relational Retrieval [49.42043077545341]
We propose a knowledge-aware query expansion framework, augmenting LLMs with structured document relations from knowledge graph (KG)
We leverage document texts as rich KG node representations and use document-based relation filtering for our Knowledge-Aware Retrieval (KAR)
arXiv Detail & Related papers (2024-10-17T17:03:23Z) - Progressive Query Expansion for Retrieval Over Cost-constrained Data Sources [6.109188517569139]
ProQE is a progressive query expansion algorithm that iteratively expands the query as it retrieves more documents.
Our results show that ProQE outperforms state-of-the-art baselines by 37% and is the most cost-effective.
arXiv Detail & Related papers (2024-06-11T10:30:19Z) - PromptReps: Prompting Large Language Models to Generate Dense and Sparse Representations for Zero-Shot Document Retrieval [76.50690734636477]
We propose PromptReps, which combines the advantages of both categories: no need for training and the ability to retrieve from the whole corpus.
The retrieval system harnesses both dense text embedding and sparse bag-of-words representations.
arXiv Detail & Related papers (2024-04-29T04:51:30Z) - Harnessing Multi-Role Capabilities of Large Language Models for
Open-Domain Question Answering [40.2758450304531]
Open-domain question answering (ODQA) has emerged as a pivotal research spotlight in information systems.
We propose a framework that formulates the ODQA process into three basic steps: query expansion, document selection, and answer generation.
We introduce a novel prompt optimization algorithm to refine role-playing prompts and steer LLMs to produce higher-quality evidence and answers.
arXiv Detail & Related papers (2024-03-08T11:09:13Z) - Allies: Prompting Large Language Model with Beam Search [107.38790111856761]
In this work, we propose a novel method called ALLIES.
Given an input query, ALLIES leverages LLMs to iteratively generate new queries related to the original query.
By iteratively refining and expanding the scope of the original query, ALLIES captures and utilizes hidden knowledge that may not be directly through retrieval.
arXiv Detail & Related papers (2023-05-24T06:16:44Z) - Query2doc: Query Expansion with Large Language Models [69.9707552694766]
The proposed method first generates pseudo- documents by few-shot prompting large language models (LLMs)
query2doc boosts the performance of BM25 by 3% to 15% on ad-hoc IR datasets.
Our method also benefits state-of-the-art dense retrievers in terms of both in-domain and out-of-domain results.
arXiv Detail & Related papers (2023-03-14T07:27:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.