Revisiting Feedback Models for HyDE
- URL: http://arxiv.org/abs/2511.19349v1
- Date: Mon, 24 Nov 2025 17:50:18 GMT
- Title: Revisiting Feedback Models for HyDE
- Authors: Nour Jedidi, Jimmy Lin,
- Abstract summary: HyDE is a method that enriches query representations with LLM-generated hypothetical answer documents.<n>Our experiments show that HyDE's effectiveness can be substantially improved when leveraging feedback algorithms such as Rocchio to extract and weight expansion terms.
- Score: 49.53124785319461
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent approaches that leverage large language models (LLMs) for pseudo-relevance feedback (PRF) have generally not utilized well-established feedback models like Rocchio and RM3 when expanding queries for sparse retrievers like BM25. Instead, they often opt for a simple string concatenation of the query and LLM-generated expansion content. But is this optimal? To answer this question, we revisit and systematically evaluate traditional feedback models in the context of HyDE, a popular method that enriches query representations with LLM-generated hypothetical answer documents. Our experiments show that HyDE's effectiveness can be substantially improved when leveraging feedback algorithms such as Rocchio to extract and weight expansion terms, providing a simple way to further enhance the accuracy of LLM-based PRF methods.
Related papers
- Small Reward Models via Backward Inference [100.59075794599768]
FLIP (FLipped Inference for Prompt Reconstruction) is a reference-free and rubric-free reward modeling approach.<n>It reformulates reward modeling through backward inference: inferring the instruction that would most plausibly produce a given response.
arXiv Detail & Related papers (2026-02-14T01:55:39Z) - DiffuRank: Effective Document Reranking with Diffusion Language Models [71.16830004674513]
We propose DiffuRank, a reranking framework built upon diffusion language models (dLLMs)<n>dLLMs support more flexible decoding and generation processes that are not constrained to a left-to-right order.<n>We show dLLMs achieve performance comparable to, and in some cases exceeding, that of autoregressive LLMs with similar model sizes.
arXiv Detail & Related papers (2026-02-13T02:18:14Z) - LLM-Assisted Pseudo-Relevance Feedback [5.10348690267577]
Pseudo-relevance feedback methods, such as RM3, estimate an expanded query model from the top-ranked documents.<n>We propose a hybrid alternative that preserves the robustness and interpretability of classical PRF while leveraging semantic judgement.
arXiv Detail & Related papers (2026-01-16T12:31:43Z) - Generalized Pseudo-Relevance Feedback [29.669164314207947]
We introduce an assumption-relaxed framework: textitGeneralized Pseudo Relevance Feedback (GPRF)<n>GPRF performs model-free, natural language rewriting based on retrieved documents.<n>Experiments across multiple benchmarks and retrievers demonstrate GPRF consistently outperforms strong baselines.
arXiv Detail & Related papers (2025-10-29T13:08:35Z) - Rethinking On-policy Optimization for Query Augmentation [49.87723664806526]
We present the first systematic comparison of prompting-based and RL-based query augmentation across diverse benchmarks.<n>We introduce a novel hybrid method, On-policy Pseudo-document Query Expansion (OPQE), which learns to generate a pseudo-document that maximizes retrieval performance.
arXiv Detail & Related papers (2025-10-20T04:16:28Z) - How Do LLM-Generated Texts Impact Term-Based Retrieval Models? [76.92519309816008]
This paper investigates the influence of large language models (LLMs) on term-based retrieval models.<n>Our linguistic analysis reveals that LLM-generated texts exhibit smoother high-frequency and steeper low-frequency Zipf slopes.<n>Our study further explores whether term-based retrieval models demonstrate source bias, concluding that these models prioritize documents whose term distributions closely correspond to those of the queries.
arXiv Detail & Related papers (2025-08-25T06:43:27Z) - R$^2$ec: Towards Large Recommender Models with Reasoning [59.32598867813266]
We propose R$2$ec, a unified large recommender model with intrinsic reasoning capability.<n>R$2$ec introduces a dual-head architecture that supports both reasoning chain generation and efficient item prediction in a single model.<n>To overcome the lack of annotated reasoning data, we design RecPO, a reinforcement learning framework.
arXiv Detail & Related papers (2025-05-22T17:55:43Z) - LLM-VPRF: Large Language Model Based Vector Pseudo Relevance Feedback [31.017301950179295]
Vector Pseudo Relevance Feedback (VPRF) has shown promising results in improving BERT-based dense retrieval systems.<n>This paper investigates the generalizability of VPRF to Large Language Model (LLM) based dense retrievers.
arXiv Detail & Related papers (2025-04-02T08:02:01Z) - Pseudo Relevance Feedback is Enough to Close the Gap Between Small and Large Dense Retrieval Models [29.934928091542375]
Scaling dense retrievers to larger large language model (LLM) backbones has been a dominant strategy for improving their retrieval effectiveness.<n>We introduce PromptPRF, a feature-based pseudo-relevance feedback (PRF) framework that enables small LLM-based dense retrievers to achieve effectiveness comparable to much larger models.
arXiv Detail & Related papers (2025-03-19T04:30:20Z) - GenCRF: Generative Clustering and Reformulation Framework for Enhanced Intent-Driven Information Retrieval [20.807374287510623]
We propose GenCRF: a Generative Clustering and Reformulation Framework to capture diverse intentions adaptively.
We show that GenCRF achieves state-of-the-art performance, surpassing previous query reformulation SOTAs by up to 12% on nDCG@10.
arXiv Detail & Related papers (2024-09-17T05:59:32Z) - RaFe: Ranking Feedback Improves Query Rewriting for RAG [83.24385658573198]
We propose a framework for training query rewriting models free of annotations.
By leveraging a publicly available reranker, oursprovides feedback aligned well with the rewriting objectives.
arXiv Detail & Related papers (2024-05-23T11:00:19Z) - Regression-aware Inference with LLMs [52.764328080398805]
We show that an inference strategy can be sub-optimal for common regression and scoring evaluation metrics.
We propose alternate inference strategies that estimate the Bayes-optimal solution for regression and scoring metrics in closed-form from sampled responses.
arXiv Detail & Related papers (2024-03-07T03:24:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.