LoL: A Comparative Regularization Loss over Query Reformulation Losses
for Pseudo-Relevance Feedback
- URL: http://arxiv.org/abs/2204.11545v1
- Date: Mon, 25 Apr 2022 10:42:50 GMT
- Title: LoL: A Comparative Regularization Loss over Query Reformulation Losses
for Pseudo-Relevance Feedback
- Authors: Yunchang Zhu, Liang Pang, Yanyan Lan, Huawei Shen, Xueqi Cheng
- Abstract summary: Pseudo-relevance feedback (PRF) has proven to be an effective query reformulation technique to improve retrieval accuracy.
Existing PRF methods independently treat revised queries originating from the same query but using different numbers of feedback documents.
We propose the Loss-over-Loss (LoL) framework to compare the reformulation losses between different revisions of the same query during training.
- Score: 70.44530794897861
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pseudo-relevance feedback (PRF) has proven to be an effective query
reformulation technique to improve retrieval accuracy. It aims to alleviate the
mismatch of linguistic expressions between a query and its potential relevant
documents. Existing PRF methods independently treat revised queries originating
from the same query but using different numbers of feedback documents,
resulting in severe query drift. Without comparing the effects of two different
revisions from the same query, a PRF model may incorrectly focus on the
additional irrelevant information increased in the more feedback, and thus
reformulate a query that is less effective than the revision using the less
feedback. Ideally, if a PRF model can distinguish between irrelevant and
relevant information in the feedback, the more feedback documents there are,
the better the revised query will be. To bridge this gap, we propose the
Loss-over-Loss (LoL) framework to compare the reformulation losses between
different revisions of the same query during training. Concretely, we revise an
original query multiple times in parallel using different amounts of feedback
and compute their reformulation losses. Then, we introduce an additional
regularization loss on these reformulation losses to penalize revisions that
use more feedback but gain larger losses. With such comparative regularization,
the PRF model is expected to learn to suppress the extra increased irrelevant
information by comparing the effects of different revised queries. Further, we
present a differentiable query reformulation method to implement this
framework. This method revises queries in the vector space and directly
optimizes the retrieval performance of query vectors, applicable for both
sparse and dense retrieval models. Empirical evaluation demonstrates the
effectiveness and robustness of our method for two typical sparse and dense
retrieval models.
Related papers
- GenCRF: Generative Clustering and Reformulation Framework for Enhanced Intent-Driven Information Retrieval [20.807374287510623]
We propose GenCRF: a Generative Clustering and Reformulation Framework to capture diverse intentions adaptively.
We show that GenCRF achieves state-of-the-art performance, surpassing previous query reformulation SOTAs by up to 12% on nDCG@10.
arXiv Detail & Related papers (2024-09-17T05:59:32Z) - MaFeRw: Query Rewriting with Multi-Aspect Feedbacks for Retrieval-Augmented Large Language Models [34.39053202801489]
In a real-world RAG system, the current query often involves spoken ellipses and ambiguous references from dialogue contexts.
We propose a novel query rewriting method MaFeRw, which improves RAG performance by integrating multi-aspect feedback from both the retrieval process and generated results.
Experimental results on two conversational RAG datasets demonstrate that MaFeRw achieves superior generation metrics and more stable training compared to baselines.
arXiv Detail & Related papers (2024-08-30T07:57:30Z) - Optimization of Retrieval-Augmented Generation Context with Outlier Detection [0.0]
We focus on methods to reduce the size and improve the quality of the prompt context required for question-answering systems.
Our goal is to select the most semantically relevant documents, treating the discarded ones as outliers.
It was found that the greatest improvements were achieved with increasing complexity of the questions and answers.
arXiv Detail & Related papers (2024-07-01T15:53:29Z) - SparseCL: Sparse Contrastive Learning for Contradiction Retrieval [87.02936971689817]
Contradiction retrieval refers to identifying and extracting documents that explicitly disagree with or refute the content of a query.
Existing methods such as similarity search and crossencoder models exhibit significant limitations.
We introduce SparseCL that leverages specially trained sentence embeddings designed to preserve subtle, contradictory nuances between sentences.
arXiv Detail & Related papers (2024-06-15T21:57:03Z) - Generative Query Reformulation Using Ensemble Prompting, Document Fusion, and Relevance Feedback [8.661419320202787]
GenQREnsemble and GenQRFusion leverage paraphrases of a zero-shot instruction to generate multiple sets of keywords to improve retrieval performance.
We demonstrate that an ensemble of query reformulations can improve retrieval effectiveness by up to 18% on nDCG@10 in pre-retrieval settings and 9% on post-retrieval settings.
arXiv Detail & Related papers (2024-05-27T21:03:26Z) - RaFe: Ranking Feedback Improves Query Rewriting for RAG [83.24385658573198]
We propose a framework for training query rewriting models free of annotations.
By leveraging a publicly available reranker, oursprovides feedback aligned well with the rewriting objectives.
arXiv Detail & Related papers (2024-05-23T11:00:19Z) - ReFIT: Relevance Feedback from a Reranker during Inference [109.33278799999582]
Retrieve-and-rerank is a prevalent framework in neural information retrieval.
We propose to leverage the reranker to improve recall by making it provide relevance feedback to the retriever at inference time.
arXiv Detail & Related papers (2023-05-19T15:30:33Z) - Factual Error Correction for Abstractive Summaries Using Entity
Retrieval [57.01193722520597]
We propose an efficient factual error correction system RFEC based on entities retrieval post-editing process.
RFEC retrieves the evidence sentences from the original document by comparing the sentences with the target summary.
Next, RFEC detects the entity-level errors in the summaries by considering the evidence sentences and substitutes the wrong entities with the accurate entities from the evidence sentences.
arXiv Detail & Related papers (2022-04-18T11:35:02Z) - Joint Passage Ranking for Diverse Multi-Answer Retrieval [56.43443577137929]
We study multi-answer retrieval, an under-explored problem that requires retrieving passages to cover multiple distinct answers for a question.
This task requires joint modeling of retrieved passages, as models should not repeatedly retrieve passages containing the same answer at the cost of missing a different valid answer.
In this paper, we introduce JPR, a joint passage retrieval model focusing on reranking. To model the joint probability of the retrieved passages, JPR makes use of an autoregressive reranker that selects a sequence of passages, equipped with novel training and decoding algorithms.
arXiv Detail & Related papers (2021-04-17T04:48:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.