Uncovering the Limitations of Query Performance Prediction: Failures, Insights, and Implications for Selective Query Processing
- URL: http://arxiv.org/abs/2504.01101v1
- Date: Tue, 01 Apr 2025 18:18:21 GMT
- Title: Uncovering the Limitations of Query Performance Prediction: Failures, Insights, and Implications for Selective Query Processing
- Authors: Adrian-Gabriel Chifu, Sébastien Déjean, Josiane Mothe, Moncef Garouani, Diego Ortiz, Md Zia Ullah,
- Abstract summary: This paper provides a comprehensive evaluation of state-of-the-art QPPs (e.g. NQC, UQC)<n>We use diverse sparse rankers (BM25, DFree without and with query expansion) and hybrid or dense (SPLADE and ColBert) rankers and diverse test collections ROBUST, GOV2, WT10G, and MS MARCO.<n>Results show significant variability in predictors accuracy, with collections as the main factor and rankers next.
- Score: 3.463527836552468
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Query Performance Prediction (QPP) estimates retrieval systems effectiveness for a given query, offering valuable insights for search effectiveness and query processing. Despite extensive research, QPPs face critical challenges in generalizing across diverse retrieval paradigms and collections. This paper provides a comprehensive evaluation of state-of-the-art QPPs (e.g. NQC, UQC), LETOR-based features, and newly explored dense-based predictors. Using diverse sparse rankers (BM25, DFree without and with query expansion) and hybrid or dense (SPLADE and ColBert) rankers and diverse test collections ROBUST, GOV2, WT10G, and MS MARCO; we investigate the relationships between predicted and actual performance, with a focus on generalization and robustness. Results show significant variability in predictors accuracy, with collections as the main factor and rankers next. Some sparse predictors perform somehow on some collections (TREC ROBUST and GOV2) but do not generalise to other collections (WT10G and MS-MARCO). While some predictors show promise in specific scenarios, their overall limitations constrain their utility for applications. We show that QPP-driven selective query processing offers only marginal gains, emphasizing the need for improved predictors that generalize across collections, align with dense retrieval architectures and are useful for downstream applications.
Related papers
- Revisiting BPR: A Replicability Study of a Common Recommender System Baseline [78.00363373925758]
We study the features of the BPR model, indicating their impact on its performance, and investigate open-source BPR implementations.
Our analysis reveals inconsistencies between these implementations and the original BPR paper, leading to a significant decrease in performance of up to 50% for specific implementations.
We show that the BPR model can achieve performance levels close to state-of-the-art methods on the top-n recommendation tasks and even outperform them on specific datasets.
arXiv Detail & Related papers (2024-09-21T18:39:53Z) - Query Performance Prediction using Relevance Judgments Generated by Large Language Models [53.97064615557883]
We propose a QPP framework using automatically generated relevance judgments (QPP-GenRE)
QPP-GenRE decomposes QPP into independent subtasks of predicting relevance of each item in a ranked list to a given query.
This allows us to predict any IR evaluation measure using the generated relevance judgments as pseudo-labels.
arXiv Detail & Related papers (2024-04-01T09:33:05Z) - iQPP: A Benchmark for Image Query Performance Prediction [24.573869540845124]
We propose the first benchmark for image query performance prediction (iQPP)
We estimate the ground-truth difficulty of each query as the average precision or the precision@k, using two state-of-the-art image retrieval models.
Next, we propose and evaluate novel pre-retrieval and post-retrieval query performance predictors, comparing them with existing or adapted (from text to image) predictors.
Our comprehensive experiments indicate that iQPP is a challenging benchmark, revealing an important research gap that needs to be addressed in future work.
arXiv Detail & Related papers (2023-02-20T17:56:57Z) - An end-to-end predict-then-optimize clustering method for intelligent
assignment problems in express systems [11.230576737829777]
We propose an intelligent end-to-end predict-then-optimize clustering method to simultaneously predict the future pick-up requests of AOIs and assign AOIs to couriers by clustering.
Results show that this kind of one-stage predict-then-optimize method is beneficial to improve the performance of optimization results.
arXiv Detail & Related papers (2022-02-18T08:52:43Z) - RnG-KBQA: Generation Augmented Iterative Ranking for Knowledge Base
Question Answering [57.94658176442027]
We present RnG-KBQA, a Rank-and-Generate approach for KBQA.
We achieve new state-of-the-art results on GrailQA and WebQSP datasets.
arXiv Detail & Related papers (2021-09-17T17:58:28Z) - Complex Event Forecasting with Prediction Suffix Trees: Extended
Technical Report [70.7321040534471]
Complex Event Recognition (CER) systems have become popular in the past two decades due to their ability to "instantly" detect patterns on real-time streams of events.
There is a lack of methods for forecasting when a pattern might occur before such an occurrence is actually detected by a CER engine.
We present a formal framework that attempts to address the issue of Complex Event Forecasting.
arXiv Detail & Related papers (2021-09-01T09:52:31Z) - Automated Concatenation of Embeddings for Structured Prediction [75.44925576268052]
We propose Automated Concatenation of Embeddings (ACE) to automate the process of finding better concatenations of embeddings for structured prediction tasks.
We follow strategies in reinforcement learning to optimize the parameters of the controller and compute the reward based on the accuracy of a task model.
arXiv Detail & Related papers (2020-10-10T14:03:20Z) - Ambiguity in Sequential Data: Predicting Uncertain Futures with
Recurrent Models [110.82452096672182]
We propose an extension of the Multiple Hypothesis Prediction (MHP) model to handle ambiguous predictions with sequential data.
We also introduce a novel metric for ambiguous problems, which is better suited to account for uncertainties.
arXiv Detail & Related papers (2020-03-10T09:15:42Z) - Expected Improvement versus Predicted Value in Surrogate-Based
Optimization [0.1529342790344802]
Surrogate-based optimization relies on so-called infill criteria to decide which point to evaluate next.
We argue that the popularity of expected improvement largely relies on its theoretical properties rather than empirically validated performance.
arXiv Detail & Related papers (2020-01-09T13:09:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.