Related papers: MILL: Mutual Verification with Large Language Models for Zero-Shot Query Expansion

MILL: Mutual Verification with Large Language Models for Zero-Shot Query Expansion

URL: http://arxiv.org/abs/2310.19056v3
Date: Thu, 28 Mar 2024 06:49:56 GMT
Title: MILL: Mutual Verification with Large Language Models for Zero-Shot Query Expansion
Authors: Pengyue Jia, Yiding Liu, Xiangyu Zhao, Xiaopeng Li, Changying Hao, Shuaiqiang Wang, Dawei Yin,
Abstract summary: We propose a novel zero-shot query expansion framework utilizing large language models (LLMs) for mutual verification. Our proposed method is fully zero-shot, and extensive experiments on three public benchmark datasets are conducted to demonstrate its effectiveness.
Score: 39.24969189479343
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Query expansion, pivotal in search engines, enhances the representation of user information needs with additional terms. While existing methods expand queries using retrieved or generated contextual documents, each approach has notable limitations. Retrieval-based methods often fail to accurately capture search intent, particularly with brief or ambiguous queries. Generation-based methods, utilizing large language models (LLMs), generally lack corpus-specific knowledge and entail high fine-tuning costs. To address these gaps, we propose a novel zero-shot query expansion framework utilizing LLMs for mutual verification. Specifically, we first design a query-query-document generation method, leveraging LLMs' zero-shot reasoning ability to produce diverse sub-queries and corresponding documents. Then, a mutual verification process synergizes generated and retrieved documents for optimal expansion. Our proposed method is fully zero-shot, and extensive experiments on three public benchmark datasets are conducted to demonstrate its effectiveness over existing methods. Our code is available online at https://github.com/Applied-Machine-Learning-Lab/MILL to ease reproduction.

Related papers

LLM4Ranking: An Easy-to-use Framework of Utilizing Large Language Models for Document Reranking [15.060195612587805]
We introduce a unified framework, textbfLLM4Ranking, which enables users to adopt different ranking methods using open-source or closed-source API-based LLMs. Our framework provides a simple and interface for document reranking with LLMs, as well as easy-to-use evaluation and fine-tuning scripts for this task.
arXiv Detail & Related papers (2025-04-10T04:08:38Z)
Learning More Effective Representations for Dense Retrieval through Deliberate Thinking Before Search [65.53881294642451]
Deliberate Thinking based Dense Retriever (DEBATER) DEBATER enhances recent dense retrievers by enabling them to learn more effective document representations through a step-by-step thinking process. Experimental results show that DEBATER significantly outperforms existing methods across several retrieval benchmarks.
arXiv Detail & Related papers (2025-02-18T15:56:34Z)
DOGR: Leveraging Document-Oriented Contrastive Learning in Generative Retrieval [10.770281363775148]
We propose a novel and general generative retrieval framework, namely Leveraging Document-Oriented Contrastive Learning in Generative Retrieval (DOGR) It adopts a two-stage learning strategy that captures the relationship between queries and documents comprehensively through direct interactions. Negative sampling methods and corresponding contrastive learning objectives are implemented to enhance the learning of semantic representations.
arXiv Detail & Related papers (2025-02-11T03:25:42Z)
Improving Tool Retrieval by Leveraging Large Language Models for Query Generation [16.7926347207647]
In-context learning can provide a short list of relevant tools in the prompt. We propose leveraging Large Language Models (LLMs) to generate a retrieval query. The generated query is embedded and used to find the most relevant tools via a nearest-neighbor search.
arXiv Detail & Related papers (2024-11-17T03:02:09Z)
Data Fusion of Synthetic Query Variants With Generative Large Language Models [1.864807003137943]
This work explores the feasibility of using synthetic query variants generated by instruction-tuned Large Language Models in data fusion experiments. We introduce a lightweight, unsupervised, and cost-efficient approach that exploits principled prompting and data fusion techniques. Our analysis shows that data fusion based on synthetic query variants is significantly better than baselines with single queries and also outperforms pseudo-relevance feedback methods.
arXiv Detail & Related papers (2024-11-06T12:54:27Z)
Zero-Shot Dense Retrieval with Embeddings from Relevance Feedback [17.986392250269606]
We introduce Real Document Embeddings from Relevance Feedback (ReDE-RF) Inspired by relevance feedback, ReDE-RF proposes to re-frame hypothetical document generation as a relevance estimation task. Our experiments show that ReDE-RF consistently surpasses state-of-the-art zero-shot dense retrieval methods.
arXiv Detail & Related papers (2024-10-28T17:40:40Z)
Knowledge-Aware Query Expansion with Large Language Models for Textual and Relational Retrieval [49.42043077545341]
We propose a knowledge-aware query expansion framework, augmenting LLMs with structured document relations from knowledge graph (KG) We leverage document texts as rich KG node representations and use document-based relation filtering for our Knowledge-Aware Retrieval (KAR)
arXiv Detail & Related papers (2024-10-17T17:03:23Z)
Progressive Query Expansion for Retrieval Over Cost-constrained Data Sources [6.109188517569139]
ProQE is a progressive query expansion algorithm that iteratively expands the query as it retrieves more documents. Our results show that ProQE outperforms state-of-the-art baselines by 37% and is the most cost-effective.
arXiv Detail & Related papers (2024-06-11T10:30:19Z)
PromptReps: Prompting Large Language Models to Generate Dense and Sparse Representations for Zero-Shot Document Retrieval [76.50690734636477]
We propose PromptReps, which combines the advantages of both categories: no need for training and the ability to retrieve from the whole corpus. The retrieval system harnesses both dense text embedding and sparse bag-of-words representations.
arXiv Detail & Related papers (2024-04-29T04:51:30Z)
Harnessing Multi-Role Capabilities of Large Language Models for Open-Domain Question Answering [40.2758450304531]
Open-domain question answering (ODQA) has emerged as a pivotal research spotlight in information systems. We propose a framework that formulates the ODQA process into three basic steps: query expansion, document selection, and answer generation. We introduce a novel prompt optimization algorithm to refine role-playing prompts and steer LLMs to produce higher-quality evidence and answers.
arXiv Detail & Related papers (2024-03-08T11:09:13Z)
Allies: Prompting Large Language Model with Beam Search [107.38790111856761]
In this work, we propose a novel method called ALLIES. Given an input query, ALLIES leverages LLMs to iteratively generate new queries related to the original query. By iteratively refining and expanding the scope of the original query, ALLIES captures and utilizes hidden knowledge that may not be directly through retrieval.
arXiv Detail & Related papers (2023-05-24T06:16:44Z)
Query2doc: Query Expansion with Large Language Models [69.9707552694766]
The proposed method first generates pseudo- documents by few-shot prompting large language models (LLMs) query2doc boosts the performance of BM25 by 3% to 15% on ad-hoc IR datasets. Our method also benefits state-of-the-art dense retrievers in terms of both in-domain and out-of-domain results.
arXiv Detail & Related papers (2023-03-14T07:27:30Z)
Text Summarization with Latent Queries [60.468323530248945]
We introduce LaQSum, the first unified text summarization system that learns Latent Queries from documents for abstractive summarization with any existing query forms. Under a deep generative framework, our system jointly optimize a latent query model and a conditional language model, allowing users to plug-and-play queries of any type at test time. Our system robustly outperforms strong comparison systems across summarization benchmarks with different query types, document settings, and target domains.
arXiv Detail & Related papers (2021-05-31T21:14:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.