Related papers: Counterfactual Query Rewriting to Use Historical Relevance Feedback

Counterfactual Query Rewriting to Use Historical Relevance Feedback

URL: http://arxiv.org/abs/2502.03891v1
Date: Thu, 06 Feb 2025 09:05:41 GMT
Title: Counterfactual Query Rewriting to Use Historical Relevance Feedback
Authors: Jüri Keller, Maik Fröbe, Gijs Hendriksen, Daria Alexander, Martin Potthast, Matthias Hagen, Philipp Schaer,
Abstract summary: We propose approaches to rewrite user queries and compare them against a system that directly uses the previous qrels for the ranking.<n>We expand queries with terms extracted from the previously relevant documents or derive so-called keyqueries that rank the previously relevant documents to the top of the current corpus.<n>Our evaluation in the CLEF LongEval scenario shows that rewriting queries with historical relevance feedback improves the retrieval effectiveness and even outperforms computationally expensive transformer-based approaches.
Score: 25.893083499927776
License: http://creativecommons.org/licenses/by/4.0/
Abstract: When a retrieval system receives a query it has encountered before, previous relevance feedback, such as clicks or explicit judgments can help to improve retrieval results. However, the content of a previously relevant document may have changed, or the document might not be available anymore. Despite this evolved corpus, we counterfactually use these previously relevant documents as relevance signals. In this paper we proposed approaches to rewrite user queries and compare them against a system that directly uses the previous qrels for the ranking. We expand queries with terms extracted from the previously relevant documents or derive so-called keyqueries that rank the previously relevant documents to the top of the current corpus. Our evaluation in the CLEF LongEval scenario shows that rewriting queries with historical relevance feedback improves the retrieval effectiveness and even outperforms computationally expensive transformer-based approaches.

Related papers

Breaking the Lens of the Telescope: Online Relevance Estimation over Large Retrieval Sets [15.549852480638066]
We propose a novel paradigm for re-ranking called online relevance estimation. Online relevance estimation continuously updates relevance estimates for a query throughout the ranking process. We validate our approach on TREC benchmarks under two scenarios: hybrid retrieval and adaptive retrieval.
arXiv Detail & Related papers (2025-04-12T22:05:50Z)
Cognitive-Aligned Document Selection for Retrieval-augmented Generation [2.9060210098040855]
We propose GGatrieval to dynamically update queries and filter high-quality, reliable retrieval documents. We parse the user query into its syntactic components and perform fine-grained grounded alignment with the retrieved documents. Our approach introduces a novel criterion for filtering retrieved documents, closely emulating human strategies for acquiring targeted information.
arXiv Detail & Related papers (2025-02-17T13:00:15Z)
Reproducible Hybrid Time-Travel Retrieval in Evolving Corpora [1.9202615342033464]
We present a hybrid retrieval system combining Lucene for fast retrieval with a column-store-based retrieval system maintaining a versioned and time-stamped index.
arXiv Detail & Related papers (2024-11-06T16:57:55Z)
Generating Natural Language Queries for More Effective Systematic Review Screening Prioritisation [53.77226503675752]
The current state of the art uses the final title of the review as a query to rank the documents using BERT-based neural rankers. In this paper, we explore alternative sources of queries for prioritising screening, such as the Boolean query used to retrieve the documents to be screened and queries generated by instruction-based large-scale language models such as ChatGPT and Alpaca. Our best approach is not only viable based on the information available at the time of screening, but also has similar effectiveness to the final title.
arXiv Detail & Related papers (2023-09-11T05:12:14Z)
DAPR: A Benchmark on Document-Aware Passage Retrieval [57.45793782107218]
We propose and name this task emphDocument-Aware Passage Retrieval (DAPR) While analyzing the errors of the State-of-The-Art (SoTA) passage retrievers, we find the major errors (53.5%) are due to missing document context. Our created benchmark enables future research on developing and comparing retrieval systems for the new task.
arXiv Detail & Related papers (2023-05-23T10:39:57Z)
CAPSTONE: Curriculum Sampling for Dense Retrieval with Document Expansion [68.19934563919192]
We propose a curriculum sampling strategy that utilizes pseudo queries during training and progressively enhances the relevance between the generated query and the real query. Experimental results on both in-domain and out-of-domain datasets demonstrate that our approach outperforms previous dense retrieval models.
arXiv Detail & Related papers (2022-12-18T15:57:46Z)
Incorporating Relevance Feedback for Information-Seeking Retrieval using Few-Shot Document Re-Ranking [56.80065604034095]
We introduce a kNN approach that re-ranks documents based on their similarity with the query and the documents the user considers relevant. To evaluate our different integration strategies, we transform four existing information retrieval datasets into the relevance feedback scenario.
arXiv Detail & Related papers (2022-10-19T16:19:37Z)
LoL: A Comparative Regularization Loss over Query Reformulation Losses for Pseudo-Relevance Feedback [70.44530794897861]
Pseudo-relevance feedback (PRF) has proven to be an effective query reformulation technique to improve retrieval accuracy. Existing PRF methods independently treat revised queries originating from the same query but using different numbers of feedback documents. We propose the Loss-over-Loss (LoL) framework to compare the reformulation losses between different revisions of the same query during training.
arXiv Detail & Related papers (2022-04-25T10:42:50Z)
GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion. The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z)
CODER: An efficient framework for improving retrieval through COntextualized Document Embedding Reranking [11.635294568328625]
We present a framework for improving the performance of a wide class of retrieval models at minimal computational cost. It utilizes precomputed document representations extracted by a base dense retrieval method. It incurs a negligible computational overhead on top of any first-stage method at run time, allowing it to be easily combined with any state-of-the-art dense retrieval method.
arXiv Detail & Related papers (2021-12-16T10:25:26Z)
Improving Query Representations for Dense Retrieval with Pseudo Relevance Feedback [29.719150565643965]
This paper proposes ANCE-PRF, a new query encoder that uses pseudo relevance feedback (PRF) to improve query representations for dense retrieval. ANCE-PRF uses a BERT encoder that consumes the query and the top retrieved documents from a dense retrieval model, ANCE, and it learns to produce better query embeddings directly from relevance labels. Analysis shows that the PRF encoder effectively captures the relevant and complementary information from PRF documents, while ignoring the noise with its learned attention mechanism.
arXiv Detail & Related papers (2021-08-30T18:10:26Z)
Knowledge-Aided Open-Domain Question Answering [58.712857964048446]
We propose a knowledge-aided open-domain QA (KAQA) method which targets at improving relevant document retrieval and answer reranking. During document retrieval, a candidate document is scored by considering its relationship to the question and other documents. During answer reranking, a candidate answer is reranked using not only its own context but also the clues from other documents.
arXiv Detail & Related papers (2020-06-09T13:28:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.