LLM-Assisted Pseudo-Relevance Feedback
- URL: http://arxiv.org/abs/2601.11238v1
- Date: Fri, 16 Jan 2026 12:31:43 GMT
- Title: LLM-Assisted Pseudo-Relevance Feedback
- Authors: David Otero, Javier Parapar,
- Abstract summary: Pseudo-relevance feedback methods, such as RM3, estimate an expanded query model from the top-ranked documents.<n>We propose a hybrid alternative that preserves the robustness and interpretability of classical PRF while leveraging semantic judgement.
- Score: 5.10348690267577
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Query expansion is a long-standing technique to mitigate vocabulary mismatch in ad hoc Information Retrieval. Pseudo-relevance feedback methods, such as RM3, estimate an expanded query model from the top-ranked documents, but remain vulnerable to topic drift when early results include noisy or tangential content. Recent approaches instead prompt Large Language Models to generate synthetic expansions or query variants. While effective, these methods risk hallucinations and misalignment with collection-specific terminology. We propose a hybrid alternative that preserves the robustness and interpretability of classical PRF while leveraging LLM semantic judgement. Our method inserts an LLM-based filtering stage prior to RM3 estimation: the LLM judges the documents in the initial top-$k$ ranking, and RM3 is computed only over those accepted as relevant. This simple intervention improves over blind PRF and a strong baseline across several datasets and metrics.
Related papers
- DiffuRank: Effective Document Reranking with Diffusion Language Models [71.16830004674513]
We propose DiffuRank, a reranking framework built upon diffusion language models (dLLMs)<n>dLLMs support more flexible decoding and generation processes that are not constrained to a left-to-right order.<n>We show dLLMs achieve performance comparable to, and in some cases exceeding, that of autoregressive LLMs with similar model sizes.
arXiv Detail & Related papers (2026-02-13T02:18:14Z) - Revisiting Feedback Models for HyDE [49.53124785319461]
HyDE is a method that enriches query representations with LLM-generated hypothetical answer documents.<n>Our experiments show that HyDE's effectiveness can be substantially improved when leveraging feedback algorithms such as Rocchio to extract and weight expansion terms.
arXiv Detail & Related papers (2025-11-24T17:50:18Z) - HADSF: Aspect Aware Semantic Control for Explainable Recommendation [4.75127493865044]
Recent advances in large language models (LLMs) promise more effective information extraction for recommender systems.<n>We propose a two-stage approach that induces a compact, corpus-level aspect vocabulary via adaptive selection and then performs vocabulary-guided, explicitly constrained extraction of structured aspect-opinion triples.<n> Experiments on approximately 3 million reviews spanning 1.5B-70B parameters show that, when integrated into standard rating predictors, HADSF yields consistent reductions in prediction error.
arXiv Detail & Related papers (2025-10-30T20:49:33Z) - Rethinking On-policy Optimization for Query Augmentation [49.87723664806526]
We present the first systematic comparison of prompting-based and RL-based query augmentation across diverse benchmarks.<n>We introduce a novel hybrid method, On-policy Pseudo-document Query Expansion (OPQE), which learns to generate a pseudo-document that maximizes retrieval performance.
arXiv Detail & Related papers (2025-10-20T04:16:28Z) - How Do LLM-Generated Texts Impact Term-Based Retrieval Models? [76.92519309816008]
This paper investigates the influence of large language models (LLMs) on term-based retrieval models.<n>Our linguistic analysis reveals that LLM-generated texts exhibit smoother high-frequency and steeper low-frequency Zipf slopes.<n>Our study further explores whether term-based retrieval models demonstrate source bias, concluding that these models prioritize documents whose term distributions closely correspond to those of the queries.
arXiv Detail & Related papers (2025-08-25T06:43:27Z) - Aligned Query Expansion: Efficient Query Expansion for Information Retrieval through LLM Alignment [4.21943400140261]
Aligned Query Expansion (AQE) is a novel approach to enhance query expansion for passage retrieval in open-domain question answering.<n>We show that AQE outperforms baseline models for query expansion in both in-domain and out-of-domain settings.
arXiv Detail & Related papers (2025-07-15T07:11:29Z) - LLM-Lasso: A Robust Framework for Domain-Informed Feature Selection and Regularization [59.75242204923353]
We introduce LLM-Lasso, a framework that leverages large language models (LLMs) to guide feature selection in Lasso regression.<n>LLMs generate penalty factors for each feature, which are converted into weights for the Lasso penalty using a simple, tunable model.<n>Features identified as more relevant by the LLM receive lower penalties, increasing their likelihood of being retained in the final model.
arXiv Detail & Related papers (2025-02-15T02:55:22Z) - LLM with Relation Classifier for Document-Level Relation Extraction [25.587850398830252]
Large language models (LLMs) have created a new paradigm for natural language processing.<n>This paper investigates the causes of this performance gap, identifying the dispersion of attention by LLMs due to entity pairs without relations as a key factor.
arXiv Detail & Related papers (2024-08-25T16:43:19Z) - HyPoradise: An Open Baseline for Generative Speech Recognition with
Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction.
The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses.
LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.