QA-Expand: Multi-Question Answer Generation for Enhanced Query Expansion in Information Retrieval
- URL: http://arxiv.org/abs/2502.08557v2
- Date: Sun, 16 Feb 2025 14:24:44 GMT
- Title: QA-Expand: Multi-Question Answer Generation for Enhanced Query Expansion in Information Retrieval
- Authors: Wonduk Seo, Seunghyun Lee,
- Abstract summary: We introduce QA-Expand, a novel and effective framework for query expansion.
It first generates multiple relevant questions from the initial query and subsequently produces corresponding pseudo-answers as surrogate documents.
Extensive experiments on benchmarks such as BEIR and TREC demonstrate that QA-Expand enhances retrieval performance by up to 13% over state-of-the-art methods.
- Score: 12.095687580827065
- License:
- Abstract: Query expansion is widely used in Information Retrieval (IR) to improve search outcomes by enriching queries with additional contextual information. Although recent Large Language Model (LLM) based methods generate pseudo-relevant content and expanded terms via multiple prompts, they often yield repetitive, narrow expansions that lack the diverse context needed to retrieve all relevant information. In this paper, we introduce QA-Expand, a novel and effective framework for query expansion. It first generates multiple relevant questions from the initial query and subsequently produces corresponding pseudo-answers as surrogate documents. A feedback model further rewrites and filters these answers to ensure only the most informative augmentations are incorporated. Extensive experiments on benchmarks such as BEIR and TREC demonstrate that QA-Expand enhances retrieval performance by up to 13% over state-of-the-art methods, offering a robust solution for modern retrieval challenges.
Related papers
- Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent [102.31558123570437]
Multimodal Retrieval Augmented Generation (mRAG) plays an important role in mitigating the "hallucination" issue inherent in multimodal large language models (MLLMs)
We propose the first self-adaptive planning agent for multimodal retrieval, OmniSearch.
arXiv Detail & Related papers (2024-11-05T09:27:21Z) - Retriever-and-Memory: Towards Adaptive Note-Enhanced Retrieval-Augmented Generation [72.70046559930555]
We propose a generic RAG approach called Adaptive Note-Enhanced RAG (Adaptive-Note) for complex QA tasks.
Specifically, Adaptive-Note introduces an overarching view of knowledge growth, iteratively gathering new information in the form of notes.
In addition, we employ an adaptive, note-based stop-exploration strategy to decide "what to retrieve and when to stop" to encourage sufficient knowledge exploration.
arXiv Detail & Related papers (2024-10-11T14:03:29Z) - GenCRF: Generative Clustering and Reformulation Framework for Enhanced Intent-Driven Information Retrieval [20.807374287510623]
We propose GenCRF: a Generative Clustering and Reformulation Framework to capture diverse intentions adaptively.
We show that GenCRF achieves state-of-the-art performance, surpassing previous query reformulation SOTAs by up to 12% on nDCG@10.
arXiv Detail & Related papers (2024-09-17T05:59:32Z) - Optimization of Retrieval-Augmented Generation Context with Outlier Detection [0.0]
We focus on methods to reduce the size and improve the quality of the prompt context required for question-answering systems.
Our goal is to select the most semantically relevant documents, treating the discarded ones as outliers.
It was found that the greatest improvements were achieved with increasing complexity of the questions and answers.
arXiv Detail & Related papers (2024-07-01T15:53:29Z) - Database-Augmented Query Representation for Information Retrieval [59.57065228857247]
We present a novel retrieval framework called Database-Augmented Query representation (DAQu)
DAQu augments the original query with various (query-related) metadata across multiple tables.
We validate DAQu in diverse retrieval scenarios that can incorporate metadata from the relational database.
arXiv Detail & Related papers (2024-06-23T05:02:21Z) - Corpus-Steered Query Expansion with Large Language Models [35.64662397095323]
We introduce Corpus-Steered Query Expansion (CSQE) to promote the incorporation of knowledge embedded within the corpus.
CSQE utilizes the relevance assessing capability of LLMs to systematically identify pivotal sentences in the initially-retrieved documents.
Extensive experiments reveal that CSQE exhibits strong performance without necessitating any training, especially with queries for which LLMs lack knowledge.
arXiv Detail & Related papers (2024-02-28T03:58:58Z) - When do Generative Query and Document Expansions Fail? A Comprehensive
Study Across Methods, Retrievers, and Datasets [69.28733312110566]
We conduct the first comprehensive analysis of LM-based expansion.
We find that there exists a strong negative correlation between retriever performance and gains from expansion.
Our results suggest the following recipe: use expansions for weaker models or when the target dataset significantly differs from training corpus in format.
arXiv Detail & Related papers (2023-09-15T17:05:43Z) - End-to-end Knowledge Retrieval with Multi-modal Queries [50.01264794081951]
ReMuQ requires a system to retrieve knowledge from a large corpus by integrating contents from both text and image queries.
We introduce a retriever model ReViz'' that can directly process input text and images to retrieve relevant knowledge in an end-to-end fashion.
We demonstrate superior performance in retrieval on two datasets under zero-shot settings.
arXiv Detail & Related papers (2023-06-01T08:04:12Z) - Generate rather than Retrieve: Large Language Models are Strong Context
Generators [74.87021992611672]
We present a novel perspective for solving knowledge-intensive tasks by replacing document retrievers with large language model generators.
We call our method generate-then-read (GenRead), which first prompts a large language model to generate contextutal documents based on a given question, and then reads the generated documents to produce the final answer.
arXiv Detail & Related papers (2022-09-21T01:30:59Z) - Using Query Expansion in Manifold Ranking for Query-Oriented
Multi-Document Summarization [3.146785346730256]
We present a query expansion method, which is combined in the manifold ranking to resolve this problem.
Our method not only utilizes the information of the query term itself and the knowledge base WordNet to expand it by synonyms, but also uses the information of the document set itself to expand the query in various ways.
In addition, we use the degree of word overlap and the proximity between words to calculate the similarity between sentences.
arXiv Detail & Related papers (2021-07-31T02:20:44Z) - Answering Counting Queries over DL-Lite Ontologies [0.0]
We introduce a general form of counting query, relate it to previous proposals, and study the complexity of answering such queries.
We consider some practically relevant restrictions, for which we establish improved complexity bounds.
arXiv Detail & Related papers (2020-09-02T11:10:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.