ReFeed: Retrieval Feedback-Guided Dataset Construction for Style-Aware Query Rewriting
- URL: http://arxiv.org/abs/2603.01417v1
- Date: Mon, 02 Mar 2026 03:43:53 GMT
- Title: ReFeed: Retrieval Feedback-Guided Dataset Construction for Style-Aware Query Rewriting
- Authors: Jiyoon Myung, Jungki Son, Kyungro Lee, Jihyeon Park, Joohyung Han,
- Abstract summary: Retrieval systems often fail when user queries differ stylistically or semantically from the language used in domain documents.<n>This work highlights a new direction in data-centric information retrieval, emphasizing how feedback loops and document-style alignment can enhance the reasoning and adaptability of RAG systems.
- Score: 0.4077787659104315
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Retrieval systems often fail when user queries differ stylistically or semantically from the language used in domain documents. Query rewriting has been proposed to bridge this gap, improving retrieval by reformulating user queries into semantically equivalent forms. However, most existing approaches overlook the stylistic characteristics of target documents-their domain-specific phrasing, tone, and structure-which are crucial for matching real-world data distributions. We introduce a retrieval feedback-driven dataset generation framework that automatically identifies failed retrieval cases, leverages large language models to rewrite queries in the style of relevant documents, and verifies improvement through re-retrieval. The resulting corpus of (original, rewritten) query pairs enables the training of rewriter models that are explicitly aware of document style and retrieval feedback. This work highlights a new direction in data-centric information retrieval, emphasizing how feedback loops and document-style alignment can enhance the reasoning and adaptability of RAG systems in real-world, domain-specific contexts.
Related papers
- Retrieval or Representation? Reassessing Benchmark Gaps in Multilingual and Visually Rich RAG [1.4425299138308667]
BM25 rank documents by term overlap with corpus-level weighting.<n>End-to-end multimodal retrievers trained on large query-document datasets claim substantial improvements over these approaches.<n>We demonstrate that better document representation is the primary driver of benchmark improvements.
arXiv Detail & Related papers (2026-03-04T16:21:20Z) - Generalized Pseudo-Relevance Feedback [29.669164314207947]
We introduce an assumption-relaxed framework: textitGeneralized Pseudo Relevance Feedback (GPRF)<n>GPRF performs model-free, natural language rewriting based on retrieved documents.<n>Experiments across multiple benchmarks and retrievers demonstrate GPRF consistently outperforms strong baselines.
arXiv Detail & Related papers (2025-10-29T13:08:35Z) - Chain of Retrieval: Multi-Aspect Iterative Search Expansion and Post-Order Search Aggregation for Full Paper Retrieval [68.71038700559195]
Chain of Retrieval(COR) is a novel iterative framework for full-paper retrieval.<n>We present SCIBENCH, a benchmark providing both complete and segmented contexts of full papers for queries and candidates.
arXiv Detail & Related papers (2025-07-14T08:41:53Z) - Beyond Chunking: Discourse-Aware Hierarchical Retrieval for Long Document Question Answering [51.7493726399073]
We present a discourse-aware hierarchical framework to enhance long document question answering.<n>The framework involves three key innovations: specialized discourse parsing for lengthy documents, LLM-based enhancement of discourse relation nodes, and structure-guided hierarchical retrieval.
arXiv Detail & Related papers (2025-05-26T14:45:12Z) - MURR: Model Updating with Regularized Replay for Searching a Document Stream [32.0637790321157]
Internet produces a continuous stream of new documents and user-generated queries.<n>Neural retrieval models that were trained once on a fixed set of query-document pairs will quickly start misrepresenting newly-created content.<n>We propose MURR, a model updating strategy with regularized replay, to ensure the model can still faithfully search existing documents without reprocessing.
arXiv Detail & Related papers (2025-04-14T14:13:03Z) - Cognitive-Aligned Document Selection for Retrieval-augmented Generation [2.9060210098040855]
We propose GGatrieval to dynamically update queries and filter high-quality, reliable retrieval documents.<n>We parse the user query into its syntactic components and perform fine-grained grounded alignment with the retrieved documents.<n>Our approach introduces a novel criterion for filtering retrieved documents, closely emulating human strategies for acquiring targeted information.
arXiv Detail & Related papers (2025-02-17T13:00:15Z) - Counterfactual Query Rewriting to Use Historical Relevance Feedback [25.893083499927776]
We propose approaches to rewrite user queries and compare them against a system that directly uses the previous qrels for the ranking.<n>We expand queries with terms extracted from the previously relevant documents or derive so-called keyqueries that rank the previously relevant documents to the top of the current corpus.<n>Our evaluation in the CLEF LongEval scenario shows that rewriting queries with historical relevance feedback improves the retrieval effectiveness and even outperforms computationally expensive transformer-based approaches.
arXiv Detail & Related papers (2025-02-06T09:05:41Z) - Adaptive Query Rewriting: Aligning Rewriters through Marginal Probability of Conversational Answers [66.55612528039894]
AdaQR is a framework for training query rewriting models with limited rewrite annotations from seed datasets and completely no passage label.
A novel approach is proposed to assess retriever's preference for these candidates by the probability of answers conditioned on the conversational query.
arXiv Detail & Related papers (2024-06-16T16:09:05Z) - Generative Query Reformulation Using Ensemble Prompting, Document Fusion, and Relevance Feedback [8.661419320202787]
GenQREnsemble and GenQRFusion leverage paraphrases of a zero-shot instruction to generate multiple sets of keywords to improve retrieval performance.
We demonstrate that an ensemble of query reformulations can improve retrieval effectiveness by up to 18% on nDCG@10 in pre-retrieval settings and 9% on post-retrieval settings.
arXiv Detail & Related papers (2024-05-27T21:03:26Z) - Evaluating Generative Ad Hoc Information Retrieval [58.800799175084286]
generative retrieval systems often directly return a grounded generated text as a response to a query.
Quantifying the utility of the textual responses is essential for appropriately evaluating such generative ad hoc retrieval.
arXiv Detail & Related papers (2023-11-08T14:05:00Z) - Decomposing Complex Queries for Tip-of-the-tongue Retrieval [72.07449449115167]
Complex queries describe content elements (e.g., book characters or events), information beyond the document text.
This retrieval setting, called tip of the tongue (TOT), is especially challenging for models reliant on lexical and semantic overlap between query and document text.
We introduce a simple yet effective framework for handling such complex queries by decomposing the query into individual clues, routing those as sub-queries to specialized retrievers, and ensembling the results.
arXiv Detail & Related papers (2023-05-24T11:43:40Z) - CAPSTONE: Curriculum Sampling for Dense Retrieval with Document
Expansion [68.19934563919192]
We propose a curriculum sampling strategy that utilizes pseudo queries during training and progressively enhances the relevance between the generated query and the real query.
Experimental results on both in-domain and out-of-domain datasets demonstrate that our approach outperforms previous dense retrieval models.
arXiv Detail & Related papers (2022-12-18T15:57:46Z) - UnifieR: A Unified Retriever for Large-Scale Retrieval [84.61239936314597]
Large-scale retrieval is to recall relevant documents from a huge collection given a query.
Recent retrieval methods based on pre-trained language models (PLM) can be coarsely categorized into either dense-vector or lexicon-based paradigms.
We propose a new learning framework, UnifieR which unifies dense-vector and lexicon-based retrieval in one model with a dual-representing capability.
arXiv Detail & Related papers (2022-05-23T11:01:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.