Related papers: LoRE: Logit-Ranked Retriever Ensemble for Enhancing Open-Domain Question Answering

LoRE: Logit-Ranked Retriever Ensemble for Enhancing Open-Domain Question Answering

URL: http://arxiv.org/abs/2410.10042v1
Date: Sun, 13 Oct 2024 23:06:08 GMT
Title: LoRE: Logit-Ranked Retriever Ensemble for Enhancing Open-Domain Question Answering
Authors: Saikrishna Sanniboina, Shiv Trivedi, Sreenidhi Vijayaraghavan,
Abstract summary: We propose LoRE, a novel approach that improves answer accuracy and relevance by mitigating positional bias. LoRE employs an ensemble of diverse retrievers, such as BM25 and sentence transformers with FAISS indexing. A key innovation is a logit-based answer ranking algorithm that combines the logit scores from a large language model with the retrieval ranks of the passages.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Retrieval-based question answering systems often suffer from positional bias, leading to suboptimal answer generation. We propose LoRE (Logit-Ranked Retriever Ensemble), a novel approach that improves answer accuracy and relevance by mitigating positional bias. LoRE employs an ensemble of diverse retrievers, such as BM25 and sentence transformers with FAISS indexing. A key innovation is a logit-based answer ranking algorithm that combines the logit scores from a large language model (LLM), with the retrieval ranks of the passages. Experimental results on NarrativeQA, SQuAD demonstrate that LoRE significantly outperforms existing retrieval-based methods in terms of exact match and F1 scores. On SQuAD, LoRE achieves 14.5\%, 22.83\%, and 14.95\% improvements over the baselines for ROUGE-L, EM, and F1, respectively. Qualitatively, LoRE generates more relevant and accurate answers, especially for complex queries.

Related papers

ReasonIR: Training Retrievers for Reasoning Tasks [139.54343970560103]
ReasonIR-8B is the first retriever specifically trained for general reasoning tasks. It achieves a new state-of-the-art of 29.9 nDCG@10 without reranker and 36.9 nDCG@10 with reranker on BRIGHT, a widely-used information retrieval benchmark.
arXiv Detail & Related papers (2025-04-29T09:49:28Z)
Rank-R1: Enhancing Reasoning in LLM-based Document Rerankers via Reinforcement Learning [76.50690734636477]
We introduce Rank-R1, a novel LLM-based reranker that performs reasoning over both the user query and candidate documents before performing the ranking task. Our experiments on the TREC DL and BRIGHT datasets show that Rank-R1 is highly effective, especially for complex queries.
arXiv Detail & Related papers (2025-03-08T03:14:26Z)
Intrinsic Evaluation of RAG Systems for Deep-Logic Questions [2.869056892890114]
We introduce the Overall Performance Index (OPI), an intrinsic metric to evaluate retrieval-augmented generation (RAG) mechanisms for applications involving deep-logic queries. OPI is computed as the harmonic mean of two key metrics: the Logical-Relation Correctness Ratio and the average of BERT embedding similarity scores between ground-truth and generated answers.
arXiv Detail & Related papers (2024-10-03T19:25:05Z)
W-RAG: Weakly Supervised Dense Retrieval in RAG for Open-domain Question Answering [28.79851078451609]
Large Language Models (LLMs) often struggle to generate factual answers relying solely on their internal (parametric) knowledge. To address this limitation, Retrieval-Augmented Generation (RAG) systems enhance LLMs by retrieving relevant information from external sources. We propose W-RAG by utilizing the ranking capabilities of LLMs to create weakly labeled data for training dense retrievers.
arXiv Detail & Related papers (2024-08-15T22:34:44Z)
RAG-QA Arena: Evaluating Domain Robustness for Long-form Retrieval Augmented Question Answering [61.19126689470398]
Long-form RobustQA (LFRQA) is a new dataset covering 26K queries and large corpora across seven different domains. We show via experiments that RAG-QA Arena and human judgments on answer quality are highly correlated. Only 41.3% of the most competitive LLM's answers are preferred to LFRQA's answers, demonstrating RAG-QA Arena as a challenging evaluation platform for future research.
arXiv Detail & Related papers (2024-07-19T03:02:51Z)
FIRST: Faster Improved Listwise Reranking with Single Token Decoding [56.727761901751194]
First, we introduce FIRST, a novel listwise LLM reranking approach leveraging the output logits of the first generated identifier to directly obtain a ranked ordering of the candidates. Empirical results demonstrate that FIRST accelerates inference by 50% while maintaining a robust ranking performance with gains across the BEIR benchmark. Our results show that LLM rerankers can provide a stronger distillation signal compared to cross-encoders, yielding substantial improvements in retriever recall after relevance feedback.
arXiv Detail & Related papers (2024-06-21T21:27:50Z)
SuRe: Summarizing Retrievals using Answer Candidates for Open-domain QA of LLMs [85.54906813106683]
We propose a simple yet effective framework to enhance open-domain question answering (ODQA) with large language models (LLMs) SuRe helps LLMs predict more accurate answers for a given question, which are well-supported by the summarized retrieval (SuRe) Experimental results on diverse ODQA benchmarks demonstrate the superiority of SuRe, with improvements of up to 4.6% in exact match (EM) and 4.0% in F1 score over standard prompting approaches.
arXiv Detail & Related papers (2024-04-17T01:15:54Z)
GenQREnsemble: Zero-Shot LLM Ensemble Prompting for Generative Query Reformulation [5.793298194062544]
We propose an ensemble based prompting technique, GenQREnsemble, to generate multiple sets of keywords. On evaluations over four IR benchmarks, we find that GenQREnsemble generates better reformulations with relative nDCG@10 improvements up to 18% and MAP improvements upto 24% over the previous zero-shot state-of-art.
arXiv Detail & Related papers (2024-04-04T18:35:25Z)
ReFIT: Relevance Feedback from a Reranker during Inference [109.33278799999582]
Retrieve-and-rerank is a prevalent framework in neural information retrieval. We propose to leverage the reranker to improve recall by making it provide relevance feedback to the retriever at inference time.
arXiv Detail & Related papers (2023-05-19T15:30:33Z)
Few-shot Reranking for Multi-hop QA via Language Model Prompting [56.454088569241534]
We study few-shot reranking for multi-hop QA with open-domain questions. We propose PromptRank, which relies on large language models prompting for multi-hop path reranking. PromptRank yields strong retrieval performance on HotpotQA with only 128 training examples.
arXiv Detail & Related papers (2022-05-25T10:45:55Z)
Adversarial Retriever-Ranker for dense text retrieval [51.87158529880056]
We present Adversarial Retriever-Ranker (AR2), which consists of a dual-encoder retriever plus a cross-encoder ranker. AR2 consistently and significantly outperforms existing dense retriever methods. This includes the improvements on Natural Questions R@5 to 77.9%(+2.1%), TriviaQA R@5 to 78.2%(+1.4), and MS-MARCO MRR@10 to 39.5%(+1.3%)
arXiv Detail & Related papers (2021-10-07T16:41:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.