Related papers: Revisiting Query Variants: The Advantage of Retrieval Over Generation of Query Variants for Effective QPP

Revisiting Query Variants: The Advantage of Retrieval Over Generation of Query Variants for Effective QPP

URL: http://arxiv.org/abs/2510.02512v1
Date: Thu, 02 Oct 2025 19:36:58 GMT
Title: Revisiting Query Variants: The Advantage of Retrieval Over Generation of Query Variants for Effective QPP
Authors: Fangzheng Tian, Debasis Ganguly, Craig Macdonald,
Abstract summary: We propose a method that retrieves QVs from a training set for a given target query of QPP.<n>To achieve a high recall in retrieving queries with the most similar information needs, we extend the directly retrieved QVs by a second retrieval.<n>Our experiments, conducted on TREC DL'19 and DL'20, show that the QPP methods with QVs retrieved by our method outperform the best-performing existing generated-QV-based QPP approaches by as much as around 20%.
Score: 24.439170886636788
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Leveraging query variants (QVs), i.e., queries with potentially similar information needs to the target query, has been shown to improve the effectiveness of query performance prediction (QPP) approaches. Existing QV-based QPP methods generate QVs facilitated by either query expansion or non-contextual embeddings, which may introduce topical drifts and hallucinations. In this paper, we propose a method that retrieves QVs from a training set (e.g., MS MARCO) for a given target query of QPP. To achieve a high recall in retrieving queries with the most similar information needs as the target query from a training set, we extend the directly retrieved QVs (1-hop QVs) by a second retrieval using their denoted relevant documents (which yields 2-hop QVs). Our experiments, conducted on TREC DL'19 and DL'20, show that the QPP methods with QVs retrieved by our method outperform the best-performing existing generated-QV-based QPP approaches by as much as around 20\%, on neural ranking models like MonoT5.

Related papers

VQPP: Video Query Performance Prediction Benchmark [22.214338497366082]
We propose the first benchmark for video query performance prediction (VQPP)<n>VQPP contains a total of 56K text queries and 51K videos, and comes with official training, validation and test splits.<n>We explore multiple pre-retrieval and post-retrieval performance predictors, creating a representative benchmark for future exploration of QPP in the video domain.
arXiv Detail & Related papers (2026-02-19T20:32:25Z)
Predicting Retrieval Utility and Answer Quality in Retrieval-Augmented Generation [24.439170886636788]
Key challenge for improving RAG is to predict both the utility of retrieved documents and the quality of the final answers in terms of correctness and relevance.<n>We define two prediction tasks within RAG: retrieval performance prediction and generation performance prediction.<n>We argue that reader-centric features, such as the LLM's perplexity of the retrieved context conditioned on the input query, can further enhance prediction accuracy.
arXiv Detail & Related papers (2026-01-20T23:59:54Z)
ReAG: Reasoning-Augmented Generation for Knowledge-based Visual Question Answering [54.72902502486611]
ReAG is a Reasoning-Augmented Multimodal RAG approach that combines coarse- and fine-grained retrieval with a critic model that filters irrelevant passages.<n>ReAG significantly outperforms prior methods, improving answer accuracy and providing interpretable reasoning grounded in retrieved evidence.
arXiv Detail & Related papers (2025-11-27T19:01:02Z)
FrugalRAG: Learning to retrieve and reason for multi-hop QA [10.193015391271535]
Large-scale fine-tuning is not needed to improve RAG metrics.<n>Supervised and RL-based fine-tuning can help RAG from the perspective of frugality.
arXiv Detail & Related papers (2025-07-10T11:02:13Z)
Query Performance Prediction using Relevance Judgments Generated by Large Language Models [53.97064615557883]
We propose a new Query performance prediction (QPP) framework using automatically generated relevance judgments (QPP-GenRE)<n>QPP-GenRE decomposes QPP into independent subtasks of predicting relevance of each item in a ranked list to a given query.<n>We predict an item's relevance by using open-source large language models (LLMs) to ensure scientific relevance.
arXiv Detail & Related papers (2024-04-01T09:33:05Z)
Query Performance Prediction: From Ad-hoc to Conversational Search [55.37199498369387]
Query performance prediction (QPP) is a core task in information retrieval. Research has shown the effectiveness and usefulness of QPP for ad-hoc search. Despite its potential, QPP for conversational search has been little studied.
arXiv Detail & Related papers (2023-05-18T12:37:01Z)
Toward Unsupervised Realistic Visual Question Answering [70.67698100148414]
We study the problem of realistic VQA (RVQA), where a model has to reject unanswerable questions (UQs) and answer answerable ones (AQs) We first point out 2 drawbacks in current RVQA research, where (1) datasets contain too many unchallenging UQs and (2) a large number of annotated UQs are required for training. We propose a new testing dataset, RGQA, which combines AQs from an existing VQA dataset with around 29K human-annotated UQs. This combines pseudo UQs obtained by randomly pairing images and questions, with an
arXiv Detail & Related papers (2023-03-09T06:58:29Z)
Generating Diverse and Consistent QA pairs from Contexts with Information-Maximizing Hierarchical Conditional VAEs [62.71505254770827]
We propose a conditional variational autoencoder (HCVAE) for generating QA pairs given unstructured texts as contexts. Our model obtains impressive performance gains over all baselines on both tasks, using only a fraction of data for training.
arXiv Detail & Related papers (2020-05-28T08:26:06Z)
Harvesting and Refining Question-Answer Pairs for Unsupervised QA [95.9105154311491]
We introduce two approaches to improve unsupervised Question Answering (QA) First, we harvest lexically and syntactically divergent questions from Wikipedia to automatically construct a corpus of question-answer pairs (named as RefQA) Second, we take advantage of the QA model to extract more appropriate answers, which iteratively refines data over RefQA.
arXiv Detail & Related papers (2020-05-06T15:56:06Z)
Template-Based Question Generation from Retrieved Sentences for Improved Unsupervised Question Answering [98.48363619128108]
We propose an unsupervised approach to training QA models with generated pseudo-training data. We show that generating questions for QA training by applying a simple template on a related, retrieved sentence rather than the original context sentence improves downstream QA performance.
arXiv Detail & Related papers (2020-04-24T17:57:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.