Visconde: Multi-document QA with GPT-3 and Neural Reranking
- URL: http://arxiv.org/abs/2212.09656v1
- Date: Mon, 19 Dec 2022 17:39:07 GMT
- Title: Visconde: Multi-document QA with GPT-3 and Neural Reranking
- Authors: Jayr Pereira, Robson Fidalgo, Roberto Lotufo, Rodrigo Nogueira
- Abstract summary: This paper proposes a question-answering system that can answer questions whose supporting evidence is spread over multiple documents.
The system, called Visconde, uses a three-step pipeline to perform the task: decompose, retrieve, and aggregate.
- Score: 4.9069311006119865
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: This paper proposes a question-answering system that can answer questions
whose supporting evidence is spread over multiple (potentially long) documents.
The system, called Visconde, uses a three-step pipeline to perform the task:
decompose, retrieve, and aggregate. The first step decomposes the question into
simpler questions using a few-shot large language model (LLM). Then, a
state-of-the-art search engine is used to retrieve candidate passages from a
large collection for each decomposed question. In the final step, we use the
LLM in a few-shot setting to aggregate the contents of the passages into the
final answer. The system is evaluated on three datasets: IIRC, Qasper, and
StrategyQA. Results suggest that current retrievers are the main bottleneck and
that readers are already performing at the human level as long as relevant
passages are provided. The system is also shown to be more effective when the
model is induced to give explanations before answering a question. Code is
available at \url{https://github.com/neuralmind-ai/visconde}.
Related papers
- Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems [124.82815637571413]
We design a procedure to synthesize Haystacks of documents, ensuring that specific textitinsights repeat across documents.
The "Summary of a Haystack" (SummHay) task then requires a system to process the Haystack and generate, given a query, a summary that identifies the relevant insights and precisely cites the source documents.
arXiv Detail & Related papers (2024-07-01T15:23:42Z) - DAPR: A Benchmark on Document-Aware Passage Retrieval [57.45793782107218]
We propose and name this task emphDocument-Aware Passage Retrieval (DAPR)
While analyzing the errors of the State-of-The-Art (SoTA) passage retrievers, we find the major errors (53.5%) are due to missing document context.
Our created benchmark enables future research on developing and comparing retrieval systems for the new task.
arXiv Detail & Related papers (2023-05-23T10:39:57Z) - UniKGQA: Unified Retrieval and Reasoning for Solving Multi-hop Question
Answering Over Knowledge Graph [89.98762327725112]
Multi-hop Question Answering over Knowledge Graph(KGQA) aims to find the answer entities that are multiple hops away from the topic entities mentioned in a natural language question.
We propose UniKGQA, a novel approach for multi-hop KGQA task, by unifying retrieval and reasoning in both model architecture and parameter learning.
arXiv Detail & Related papers (2022-12-02T04:08:09Z) - Zero-Shot Open-Book Question Answering [0.0]
This article proposes a solution for answering natural language questions from technical documents with no domain-specific labeled data (zero-shot)
We are introducing a new test dataset for open-book QA based on real customer questions on AWS technical documentation.
We were able to achieve 49% F1 and 39% exact score (EM) end-to-end with no domain-specific training.
arXiv Detail & Related papers (2021-11-22T20:38:41Z) - When Retriever-Reader Meets Scenario-Based Multiple-Choice Questions [15.528174963480614]
We propose a joint retriever-reader model called QAVES where the retriever is implicitly supervised only using relevance labels via a novel word weighting mechanism.
QAVES significantly outperforms a variety of strong baselines on multiple-choice questions in three SQA datasets.
arXiv Detail & Related papers (2021-08-31T14:32:04Z) - End-to-End Multihop Retrieval for Compositional Question Answering over
Long Documents [93.55268936974971]
We propose a multi-hop retrieval method, DocHopper, to answer compositional questions over long documents.
At each step, DocHopper retrieves a paragraph or sentence embedding from the document, mixes the retrieved result with the query, and updates the query for the next step.
We demonstrate that utilizing document structure in this was can largely improve question-answering and retrieval performance on long documents.
arXiv Detail & Related papers (2021-06-01T03:13:35Z) - ComQA:Compositional Question Answering via Hierarchical Graph Neural
Networks [47.12013005600986]
We present a large-scale compositional question answering dataset containing more than 120k human-labeled questions.
To tackle the ComQA problem, we proposed a hierarchical graph neural networks, which represents the document from the low-level word to the high-level sentence.
Our proposed model achieves a significant improvement over previous machine reading comprehension methods and pre-training methods.
arXiv Detail & Related papers (2021-01-16T08:23:27Z) - Open Question Answering over Tables and Text [55.8412170633547]
In open question answering (QA), the answer to a question is produced by retrieving and then analyzing documents that might contain answers to the question.
Most open QA systems have considered only retrieving information from unstructured text.
We present a new large-scale dataset Open Table-and-Text Question Answering (OTT-QA) to evaluate performance on this task.
arXiv Detail & Related papers (2020-10-20T16:48:14Z) - Revisiting the Open-Domain Question Answering Pipeline [0.23204178451683266]
This paper describes Mindstone, an open-domain QA system that consists of a new multi-stage pipeline.
We show how the new pipeline enables the use of low-resolution labels, and can be easily tuned to meet various timing requirements.
arXiv Detail & Related papers (2020-09-02T09:34:14Z) - NeuralQA: A Usable Library for Question Answering (Contextual Query
Expansion + BERT) on Large Datasets [0.6091702876917281]
NeuralQA is a library for Question Answering (QA) on large datasets.
It integrates with existing infrastructure (e.g., ElasticSearch instances and reader models trained with the HuggingFace Transformers API) and offers helpful defaults for QA subtasks.
Code and documentation for NeuralQA is available as open source on Github.
arXiv Detail & Related papers (2020-07-30T03:38:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.