Inter-Passage Verification for Multi-evidence Multi-answer QA
- URL: http://arxiv.org/abs/2506.00425v1
- Date: Sat, 31 May 2025 07:03:52 GMT
- Title: Inter-Passage Verification for Multi-evidence Multi-answer QA
- Authors: Bingsen Chen, Shengjie Wang, Xi Ye, Chen Zhao,
- Abstract summary: We propose a new multi-answer QA framework -- Retrieval-augmented Independent Reading with Inter-passage Verification.<n>Our framework retrieves a large set of passages and processes each passage individually to generate an initial high-recall but noisy answer set.<n>Our framework significantly outperforms existing baselines across various model sizes, achieving an average F1 score improvement of 11.17%.
- Score: 22.233409308846067
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-answer question answering (QA), where questions can have many valid answers, presents a significant challenge for existing retrieval-augmented generation-based QA systems, as these systems struggle to retrieve and then synthesize a large number of evidence passages. To tackle these challenges, we propose a new multi-answer QA framework -- Retrieval-augmented Independent Reading with Inter-passage Verification (RI$^2$VER). Our framework retrieves a large set of passages and processes each passage individually to generate an initial high-recall but noisy answer set. Then we propose a new inter-passage verification pipeline that validates every candidate answer through (1) Verification Question Generation, (2) Gathering Additional Evidence, and (3) Verification with inter-passage synthesis. Evaluations on the QAMPARI and RoMQA datasets demonstrate that our framework significantly outperforms existing baselines across various model sizes, achieving an average F1 score improvement of 11.17%. Further analysis validates that our inter-passage verification pipeline enables our framework to be particularly beneficial for questions requiring multi-evidence synthesis.
Related papers
- Beyond Retrieval: Joint Supervision and Multimodal Document Ranking for Textbook Question Answering [3.6799953119508735]
We propose a novel approach to multimodal textbook question answering by introducing a mechanism for enhancing semantic representations.<n>Our model, Joint Embedding Training With Ranking Supervision for Textbook Question Answering (JETRTQA), is a multimodal learning framework built on a retriever-generator architecture.<n>We evaluate our method on the CK12-QA dataset and demonstrate that it significantly improves the discrimination between informative and irrelevant documents.
arXiv Detail & Related papers (2025-05-17T13:23:54Z) - MHTS: Multi-Hop Tree Structure Framework for Generating Difficulty-Controllable QA Datasets for RAG Evaluation [5.525151548786079]
Existing RAG benchmarks often overlook query difficulty, leading to inflated performance on simpler questions and unreliable evaluations.<n>We propose MHTS (Multi-Hop Tree Structure), a novel dataset synthesis framework that controls multi-hop reasoning complexity by leveraging a multi-hop tree structure to generate logically connected, multi-chunk queries.
arXiv Detail & Related papers (2025-03-29T06:26:01Z) - Federated Retrieval Augmented Generation for Multi-Product Question Answering [15.250046972086164]
We introduce MKP-QA, a novel multi-product knowledge-augmented QA framework with probabilistic federated search across domains and relevant knowledge.<n>Our experiments show that MKP-QA significantly boosts multi-product RAG-QA performance in terms of both retrieval accuracy and response quality.
arXiv Detail & Related papers (2025-01-25T00:22:27Z) - Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent [92.5712549836791]
Multimodal Retrieval Augmented Generation (mRAG) plays an important role in mitigating the "hallucination" issue inherent in multimodal large language models (MLLMs)<n>We propose the first self-adaptive planning agent for multimodal retrieval, OmniSearch.
arXiv Detail & Related papers (2024-11-05T09:27:21Z) - Measuring Retrieval Complexity in Question Answering Systems [64.74106622822424]
Retrieval complexity (RC) is a novel metric conditioned on the completeness of retrieved documents.
We propose an unsupervised pipeline to measure RC given an arbitrary retrieval system.
Our system can have a major impact on retrieval-based systems.
arXiv Detail & Related papers (2024-06-05T19:30:52Z) - Chain-of-Discussion: A Multi-Model Framework for Complex Evidence-Based Question Answering [55.295699268654545]
We propose a novel Chain-ofDiscussion framework to leverage the synergy among open-source Large Language Models.<n>Our experiments show that discussions among multiple LLMs play a vital role in enhancing the quality of answers.
arXiv Detail & Related papers (2024-02-26T05:31:34Z) - HiQA: A Hierarchical Contextual Augmentation RAG for Multi-Documents QA [13.000411428297813]
We present HiQA, an advanced multi-document question-answering (MDQA) framework that integrates cascading metadata into content and a multi-route retrieval mechanism.
We also release a benchmark called MasQA to evaluate and research in MDQA.
arXiv Detail & Related papers (2024-02-01T02:24:15Z) - Reranking Overgenerated Responses for End-to-End Task-Oriented Dialogue
Systems [71.33737787564966]
End-to-end (E2E) task-oriented dialogue (ToD) systems are prone to fall into the so-called 'likelihood trap'
We propose a reranking method which aims to select high-quality items from the lists of responses initially overgenerated by the system.
Our methods improve a state-of-the-art E2E ToD system by 2.4 BLEU, 3.2 ROUGE, and 2.8 METEOR scores, achieving new peak results.
arXiv Detail & Related papers (2022-11-07T15:59:49Z) - Modeling Multi-hop Question Answering as Single Sequence Prediction [88.72621430714985]
We propose a simple generative approach (PathFid) that extends the task beyond just answer generation.
PathFid explicitly models the reasoning process to resolve the answer for multi-hop questions.
Our experiments demonstrate that PathFid leads to strong performance gains on two multi-hop QA datasets.
arXiv Detail & Related papers (2022-05-18T21:57:59Z) - Decomposing Complex Questions Makes Multi-Hop QA Easier and More
Interpretable [25.676852169835833]
Multi-hop QA requires the machine to answer complex questions through finding multiple clues and reasoning.
We propose Relation Extractor-Reader and Comparator (RERC), a three-stage framework based on complex question decomposition.
In the 2WikiMultiHopQA dataset, our RERC model has achieved the most advanced performance, with a winning joint F1 score of 53.58 on the leaderboard.
arXiv Detail & Related papers (2021-10-26T08:10:35Z) - NoiseQA: Challenge Set Evaluation for User-Centric Question Answering [68.67783808426292]
We show that components in the pipeline that precede an answering engine can introduce varied and considerable sources of error.
We conclude that there is substantial room for progress before QA systems can be effectively deployed.
arXiv Detail & Related papers (2021-02-16T18:35:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.