Improving Document Retrieval Coherence for Semantically Equivalent Queries
- URL: http://arxiv.org/abs/2508.07975v1
- Date: Mon, 11 Aug 2025 13:34:59 GMT
- Title: Improving Document Retrieval Coherence for Semantically Equivalent Queries
- Authors: Stefano Campese, Alessandro Moschitti, Ivano Lauriola,
- Abstract summary: We propose a variation of the Multi-Negative Ranking loss for training DR that improves the coherence of models in retrieving the same documents.<n>The loss penalizes discrepancies between the top-k ranked documents retrieved for diverse but semantic equivalent queries.
- Score: 63.97649988164166
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Dense Retrieval (DR) models have proven to be effective for Document Retrieval and Information Grounding tasks. Usually, these models are trained and optimized for improving the relevance of top-ranked documents for a given query. Previous work has shown that popular DR models are sensitive to the query and document lexicon: small variations of it may lead to a significant difference in the set of retrieved documents. In this paper, we propose a variation of the Multi-Negative Ranking loss for training DR that improves the coherence of models in retrieving the same documents with respect to semantically similar queries. The loss penalizes discrepancies between the top-k ranked documents retrieved for diverse but semantic equivalent queries. We conducted extensive experiments on various datasets, MS-MARCO, Natural Questions, BEIR, and TREC DL 19/20. The results show that (i) models optimizes by our loss are subject to lower sensitivity, and, (ii) interestingly, higher accuracy.
Related papers
- Improved Evidence Extraction for Document Inconsistency Detection with LLMs [10.610567456326235]
We introduce new comprehensive evidence-extraction metrics and a redact-and-retry framework with constrained filtering.<n>We back our claims with promising experimental results.
arXiv Detail & Related papers (2026-01-06T00:58:20Z) - Collapse of Dense Retrievers: Short, Early, and Literal Biases Outranking Factual Evidence [56.09494651178128]
Retrieval models are commonly used in Information Retrieval (IR) applications, such as Retrieval-Augmented Generation (RAG)<n>We quantify the impact of biases, such as a preference for shorter documents, on retrievers like Dragon+ and Contriever.<n>We uncover major vulnerabilities, showing retrievers favor shorter documents, early positions, repeated entities, and literal matches, all while ignoring the answer's presence!
arXiv Detail & Related papers (2025-03-06T23:23:13Z) - DR-RAG: Applying Dynamic Document Relevance to Retrieval-Augmented Generation for Question-Answering [4.364937306005719]
RAG has recently demonstrated the performance of Large Language Models (LLMs) in the knowledge-intensive tasks such as Question-Answering (QA)
We have found that even though there is low relevance between some critical documents and query, it is possible to retrieve the remaining documents by combining parts of the documents with the query.
A two-stage retrieval framework called Dynamic-Relevant Retrieval-Augmented Generation (DR-RAG) is proposed to improve document retrieval recall and the accuracy of answers.
arXiv Detail & Related papers (2024-06-11T15:15:33Z) - List-aware Reranking-Truncation Joint Model for Search and
Retrieval-augmented Generation [80.12531449946655]
We propose a Reranking-Truncation joint model (GenRT) that can perform the two tasks concurrently.
GenRT integrates reranking and truncation via generative paradigm based on encoder-decoder architecture.
Our method achieves SOTA performance on both reranking and truncation tasks for web search and retrieval-augmented LLMs.
arXiv Detail & Related papers (2024-02-05T06:52:53Z) - Soft Prompt Tuning for Augmenting Dense Retrieval with Large Language Models [29.735976068474105]
We propose soft prompt tuning for augmenting Dense retrieval (DR) models.
For each task, we leverage soft prompt-tuning to optimize a task-specific soft prompt on limited ground truth data.
We design a filter to select high-quality example document-query pairs in the prompt to further improve the quality of weak tagged queries.
arXiv Detail & Related papers (2023-07-17T07:55:47Z) - Incorporating Relevance Feedback for Information-Seeking Retrieval using
Few-Shot Document Re-Ranking [56.80065604034095]
We introduce a kNN approach that re-ranks documents based on their similarity with the query and the documents the user considers relevant.
To evaluate our different integration strategies, we transform four existing information retrieval datasets into the relevance feedback scenario.
arXiv Detail & Related papers (2022-10-19T16:19:37Z) - LoL: A Comparative Regularization Loss over Query Reformulation Losses
for Pseudo-Relevance Feedback [70.44530794897861]
Pseudo-relevance feedback (PRF) has proven to be an effective query reformulation technique to improve retrieval accuracy.
Existing PRF methods independently treat revised queries originating from the same query but using different numbers of feedback documents.
We propose the Loss-over-Loss (LoL) framework to compare the reformulation losses between different revisions of the same query during training.
arXiv Detail & Related papers (2022-04-25T10:42:50Z) - Augmenting Document Representations for Dense Retrieval with
Interpolation and Perturbation [49.940525611640346]
Document Augmentation for dense Retrieval (DAR) framework augments the representations of documents with their Dense Augmentation and perturbations.
We validate the performance of DAR on retrieval tasks with two benchmark datasets, showing that the proposed DAR significantly outperforms relevant baselines on the dense retrieval of both the labeled and unlabeled documents.
arXiv Detail & Related papers (2022-03-15T09:07:38Z) - Improving Query Representations for Dense Retrieval with Pseudo
Relevance Feedback [29.719150565643965]
This paper proposes ANCE-PRF, a new query encoder that uses pseudo relevance feedback (PRF) to improve query representations for dense retrieval.
ANCE-PRF uses a BERT encoder that consumes the query and the top retrieved documents from a dense retrieval model, ANCE, and it learns to produce better query embeddings directly from relevance labels.
Analysis shows that the PRF encoder effectively captures the relevant and complementary information from PRF documents, while ignoring the noise with its learned attention mechanism.
arXiv Detail & Related papers (2021-08-30T18:10:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.