Disentangling Questions from Query Generation for Task-Adaptive Retrieval
- URL: http://arxiv.org/abs/2409.16570v1
- Date: Wed, 25 Sep 2024 02:53:27 GMT
- Title: Disentangling Questions from Query Generation for Task-Adaptive Retrieval
- Authors: Yoonsang Lee, Minsoo Kim, Seung-won Hwang,
- Abstract summary: We propose EGG, a query generator that better adapts to wide search intents expressed in the BeIR benchmark.
Our method outperforms baselines and existing models on four tasks with underexplored intents, while utilizing a query generator 47 times smaller than the previous state-of-the-art.
- Score: 22.86406485412172
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies the problem of information retrieval, to adapt to unseen tasks. Existing work generates synthetic queries from domain-specific documents to jointly train the retriever. However, the conventional query generator assumes the query as a question, thus failing to accommodate general search intents. A more lenient approach incorporates task-adaptive elements, such as few-shot learning with an 137B LLM. In this paper, we challenge a trend equating query and question, and instead conceptualize query generation task as a "compilation" of high-level intent into task-adaptive query. Specifically, we propose EGG, a query generator that better adapts to wide search intents expressed in the BeIR benchmark. Our method outperforms baselines and existing models on four tasks with underexplored intents, while utilizing a query generator 47 times smaller than the previous state-of-the-art. Our findings reveal that instructing the LM with explicit search intent is a key aspect of modeling an effective query generator.
Related papers
- Retriever-and-Memory: Towards Adaptive Note-Enhanced Retrieval-Augmented Generation [72.70046559930555]
We propose a generic RAG approach called Adaptive Note-Enhanced RAG (Adaptive-Note) for complex QA tasks.
Specifically, Adaptive-Note introduces an overarching view of knowledge growth, iteratively gathering new information in the form of notes.
In addition, we employ an adaptive, note-based stop-exploration strategy to decide "what to retrieve and when to stop" to encourage sufficient knowledge exploration.
arXiv Detail & Related papers (2024-10-11T14:03:29Z) - BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval [54.54576644403115]
Many complex real-world queries require in-depth reasoning to identify relevant documents.
We introduce BRIGHT, the first text retrieval benchmark that requires intensive reasoning to retrieve relevant documents.
Our dataset consists of 1,384 real-world queries spanning diverse domains, such as economics, psychology, mathematics, and coding.
arXiv Detail & Related papers (2024-07-16T17:58:27Z) - S-EQA: Tackling Situational Queries in Embodied Question Answering [48.43453390717167]
We present and tackle the problem of Embodied Question Answering with Situational Queries (S-EQA) in a household environment.
We first introduce a novel Prompt-Generate-Evaluate scheme that wraps around an LLM's output to create a dataset of unique situational queries and corresponding consensus object information.
We report an improved accuracy of 15.31% while using queries framed from the generated object consensus for Visual Question Answering (VQA) over directly answering situational ones.
arXiv Detail & Related papers (2024-05-08T00:45:20Z) - Context Tuning for Retrieval Augmented Generation [1.201626478128059]
We propose Context Tuning for RAG, which employs a smart context retrieval system to fetch relevant information.
Our empirical results demonstrate that context tuning significantly enhances semantic search.
We also show that our proposed lightweight model using Reciprocal Rank Fusion (RRF) withMART outperforms GPT-4 based retrieval.
arXiv Detail & Related papers (2023-12-09T23:33:16Z) - An Interactive Query Generation Assistant using LLM-based Prompt
Modification and User Feedback [9.461978375200102]
The proposed interface is a novel search interface which supports automatic and interactive query generation over a mono-linguial or multi-lingual document collection.
The interface enables the users to refine the queries generated by different LLMs, to provide feedback on the retrieved documents or passages, and is able to incorporate the users' feedback as prompts to generate more effective queries.
arXiv Detail & Related papers (2023-11-19T04:42:24Z) - MILL: Mutual Verification with Large Language Models for Zero-Shot Query Expansion [39.24969189479343]
We propose a novel zero-shot query expansion framework utilizing large language models (LLMs) for mutual verification.
Our proposed method is fully zero-shot, and extensive experiments on three public benchmark datasets are conducted to demonstrate its effectiveness.
arXiv Detail & Related papers (2023-10-29T16:04:10Z) - CAPSTONE: Curriculum Sampling for Dense Retrieval with Document
Expansion [68.19934563919192]
We propose a curriculum sampling strategy that utilizes pseudo queries during training and progressively enhances the relevance between the generated query and the real query.
Experimental results on both in-domain and out-of-domain datasets demonstrate that our approach outperforms previous dense retrieval models.
arXiv Detail & Related papers (2022-12-18T15:57:46Z) - Graph Enhanced BERT for Query Understanding [55.90334539898102]
query understanding plays a key role in exploring users' search intents and facilitating users to locate their most desired information.
In recent years, pre-trained language models (PLMs) have advanced various natural language processing tasks.
We propose a novel graph-enhanced pre-training framework, GE-BERT, which can leverage both query content and the query graph.
arXiv Detail & Related papers (2022-04-03T16:50:30Z) - Query Understanding via Intent Description Generation [75.64800976586771]
We propose a novel Query-to-Intent-Description (Q2ID) task for query understanding.
Unlike existing ranking tasks which leverage the query and its description to compute the relevance of documents, Q2ID is a reverse task which aims to generate a natural language intent description.
We demonstrate the effectiveness of our model by comparing with several state-of-the-art generation models on the Q2ID task.
arXiv Detail & Related papers (2020-08-25T08:56:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.