Measuring Retrieval Complexity in Question Answering Systems
- URL: http://arxiv.org/abs/2406.03592v1
- Date: Wed, 5 Jun 2024 19:30:52 GMT
- Title: Measuring Retrieval Complexity in Question Answering Systems
- Authors: Matteo Gabburo, Nicolaas Paul Jedema, Siddhant Garg, Leonardo F. R. Ribeiro, Alessandro Moschitti,
- Abstract summary: Retrieval complexity (RC) is a novel metric conditioned on the completeness of retrieved documents.
We propose an unsupervised pipeline to measure RC given an arbitrary retrieval system.
Our system can have a major impact on retrieval-based systems.
- Score: 64.74106622822424
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we investigate which questions are challenging for retrieval-based Question Answering (QA). We (i) propose retrieval complexity (RC), a novel metric conditioned on the completeness of retrieved documents, which measures the difficulty of answering questions, and (ii) propose an unsupervised pipeline to measure RC given an arbitrary retrieval system. Our proposed pipeline measures RC more accurately than alternative estimators, including LLMs, on six challenging QA benchmarks. Further investigation reveals that RC scores strongly correlate with both QA performance and expert judgment across five of the six studied benchmarks, indicating that RC is an effective measure of question difficulty. Subsequent categorization of high-RC questions shows that they span a broad set of question shapes, including multi-hop, compositional, and temporal QA, indicating that RC scores can categorize a new subset of complex questions. Our system can also have a major impact on retrieval-based systems by helping to identify more challenging questions on existing datasets.
Related papers
- An Adaptive Framework for Generating Systematic Explanatory Answer in Online Q&A Platforms [62.878616839799776]
We propose SynthRAG, an innovative framework designed to enhance Question Answering (QA) performance.
SynthRAG improves on conventional models by employing adaptive outlines for dynamic content structuring.
An online deployment on the Zhihu platform revealed that SynthRAG's answers achieved notable user engagement.
arXiv Detail & Related papers (2024-10-23T09:14:57Z) - RAG-QA Arena: Evaluating Domain Robustness for Long-form Retrieval Augmented Question Answering [61.19126689470398]
Long-form RobustQA (LFRQA) is a new dataset covering 26K queries and large corpora across seven different domains.
We show via experiments that RAG-QA Arena and human judgments on answer quality are highly correlated.
Only 41.3% of the most competitive LLM's answers are preferred to LFRQA's answers, demonstrating RAG-QA Arena as a challenging evaluation platform for future research.
arXiv Detail & Related papers (2024-07-19T03:02:51Z) - DEXTER: A Benchmark for open-domain Complex Question Answering using LLMs [3.24692739098077]
Open-domain complex Question Answering (QA) is a difficult task with challenges in evidence retrieval and reasoning.
We evaluate state-of-the-art pre-trained dense and sparse retrieval models in an open-domain setting.
We observe that late interaction models and surprisingly lexical models like BM25 perform well compared to other pre-trained dense retrieval models.
arXiv Detail & Related papers (2024-06-24T22:09:50Z) - Unified Active Retrieval for Retrieval Augmented Generation [69.63003043712696]
In Retrieval-Augmented Generation (RAG), retrieval is not always helpful and applying it to every instruction is sub-optimal.
Existing active retrieval methods face two challenges: 1.
They usually rely on a single criterion, which struggles with handling various types of instructions.
They depend on specialized and highly differentiated procedures, and thus combining them makes the RAG system more complicated.
arXiv Detail & Related papers (2024-06-18T12:09:02Z) - Towards Better Question Generation in QA-based Event Extraction [3.699715556687871]
Event Extraction (EE) aims to extract event-related information from unstructured texts.
The quality of the questions dramatically affects the extraction accuracy.
We propose a reinforcement learning method, RLQG, for QA-based EE.
arXiv Detail & Related papers (2024-05-17T03:52:01Z) - In-Context Ability Transfer for Question Decomposition in Complex QA [6.745884231594893]
We propose icat (In-Context Ability Transfer) to solve complex question-answering tasks.
We transfer the ability to decompose complex questions to simpler questions or generate step-by-step rationales to LLMs.
We conduct large-scale experiments on a variety of complex QA tasks involving numerical reasoning, compositional complex QA, and heterogeneous complex QA.
arXiv Detail & Related papers (2023-10-26T11:11:07Z) - Decomposing Complex Questions Makes Multi-Hop QA Easier and More
Interpretable [25.676852169835833]
Multi-hop QA requires the machine to answer complex questions through finding multiple clues and reasoning.
We propose Relation Extractor-Reader and Comparator (RERC), a three-stage framework based on complex question decomposition.
In the 2WikiMultiHopQA dataset, our RERC model has achieved the most advanced performance, with a winning joint F1 score of 53.58 on the leaderboard.
arXiv Detail & Related papers (2021-10-26T08:10:35Z) - Complex Knowledge Base Question Answering: A Survey [41.680033017518376]
Knowledge base question answering (KBQA) aims to answer a question over a knowledge base (KB)
In recent years, researchers propose a large number of novel methods, which looked into the challenges of answering complex questions.
We present two mainstream categories of methods for complex KBQA, namely semantic parsing-based (SP-based) methods and information retrieval-based (IR-based) methods.
arXiv Detail & Related papers (2021-08-15T08:14:54Z) - NoiseQA: Challenge Set Evaluation for User-Centric Question Answering [68.67783808426292]
We show that components in the pipeline that precede an answering engine can introduce varied and considerable sources of error.
We conclude that there is substantial room for progress before QA systems can be effectively deployed.
arXiv Detail & Related papers (2021-02-16T18:35:29Z) - Query Focused Multi-Document Summarization with Distant Supervision [88.39032981994535]
Existing work relies heavily on retrieval-style methods for estimating the relevance between queries and text segments.
We propose a coarse-to-fine modeling framework which introduces separate modules for estimating whether segments are relevant to the query.
We demonstrate that our framework outperforms strong comparison systems on standard QFS benchmarks.
arXiv Detail & Related papers (2020-04-06T22:35:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.