Revisiting the Open-Domain Question Answering Pipeline
- URL: http://arxiv.org/abs/2009.00914v1
- Date: Wed, 2 Sep 2020 09:34:14 GMT
- Title: Revisiting the Open-Domain Question Answering Pipeline
- Authors: Sina J. Semnani, Manish Pandey
- Abstract summary: This paper describes Mindstone, an open-domain QA system that consists of a new multi-stage pipeline.
We show how the new pipeline enables the use of low-resolution labels, and can be easily tuned to meet various timing requirements.
- Score: 0.23204178451683266
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Open-domain question answering (QA) is the tasl of identifying answers to
natural questions from a large corpus of documents. The typical open-domain QA
system starts with information retrieval to select a subset of documents from
the corpus, which are then processed by a machine reader to select the answer
spans. This paper describes Mindstone, an open-domain QA system that consists
of a new multi-stage pipeline that employs a traditional BM25-based information
retriever, RM3-based neural relevance feedback, neural ranker, and a machine
reading comprehension stage. This paper establishes a new baseline for
end-to-end performance on question answering for Wikipedia/SQuAD dataset
(EM=58.1, F1=65.8), with substantial gains over the previous state of the art
(Yang et al., 2019b). We also show how the new pipeline enables the use of
low-resolution labels, and can be easily tuned to meet various timing
requirements.
Related papers
- Open-domain Question Answering via Chain of Reasoning over Heterogeneous
Knowledge [82.5582220249183]
We propose a novel open-domain question answering (ODQA) framework for answering single/multi-hop questions across heterogeneous knowledge sources.
Unlike previous methods that solely rely on the retriever for gathering all evidence in isolation, our intermediary performs a chain of reasoning over the retrieved set.
Our system achieves competitive performance on two ODQA datasets, OTT-QA and NQ, against tables and passages from Wikipedia.
arXiv Detail & Related papers (2022-10-22T03:21:32Z) - Generate rather than Retrieve: Large Language Models are Strong Context
Generators [74.87021992611672]
We present a novel perspective for solving knowledge-intensive tasks by replacing document retrievers with large language model generators.
We call our method generate-then-read (GenRead), which first prompts a large language model to generate contextutal documents based on a given question, and then reads the generated documents to produce the final answer.
arXiv Detail & Related papers (2022-09-21T01:30:59Z) - Zero-Shot Open-Book Question Answering [0.0]
This article proposes a solution for answering natural language questions from technical documents with no domain-specific labeled data (zero-shot)
We are introducing a new test dataset for open-book QA based on real customer questions on AWS technical documentation.
We were able to achieve 49% F1 and 39% exact score (EM) end-to-end with no domain-specific training.
arXiv Detail & Related papers (2021-11-22T20:38:41Z) - ComQA:Compositional Question Answering via Hierarchical Graph Neural
Networks [47.12013005600986]
We present a large-scale compositional question answering dataset containing more than 120k human-labeled questions.
To tackle the ComQA problem, we proposed a hierarchical graph neural networks, which represents the document from the low-level word to the high-level sentence.
Our proposed model achieves a significant improvement over previous machine reading comprehension methods and pre-training methods.
arXiv Detail & Related papers (2021-01-16T08:23:27Z) - Retrieving and Reading: A Comprehensive Survey on Open-domain Question
Answering [62.88322725956294]
We review the latest research trends in OpenQA, with particular attention to systems that incorporate neural MRC techniques.
We introduce modern OpenQA architecture named Retriever-Reader'' and analyze the various systems that follow this architecture.
We then discuss key challenges to developing OpenQA systems and offer an analysis of benchmarks that are commonly used.
arXiv Detail & Related papers (2021-01-04T04:47:46Z) - Answering Open-Domain Questions of Varying Reasoning Steps from Text [39.48011017748654]
We develop a unified system to answer directly from text open-domain questions.
We employ a single multi-task transformer model to perform all the necessary subtasks.
We show that our model demonstrates competitive performance on both existing benchmarks and this new benchmark.
arXiv Detail & Related papers (2020-10-23T16:51:09Z) - Open Question Answering over Tables and Text [55.8412170633547]
In open question answering (QA), the answer to a question is produced by retrieving and then analyzing documents that might contain answers to the question.
Most open QA systems have considered only retrieving information from unstructured text.
We present a new large-scale dataset Open Table-and-Text Question Answering (OTT-QA) to evaluate performance on this task.
arXiv Detail & Related papers (2020-10-20T16:48:14Z) - Answering Any-hop Open-domain Questions with Iterative Document
Reranking [62.76025579681472]
We propose a unified QA framework to answer any-hop open-domain questions.
Our method consistently achieves performance comparable to or better than the state-of-the-art on both single-hop and multi-hop open-domain QA datasets.
arXiv Detail & Related papers (2020-09-16T04:31:38Z) - Knowledge-Aided Open-Domain Question Answering [58.712857964048446]
We propose a knowledge-aided open-domain QA (KAQA) method which targets at improving relevant document retrieval and answer reranking.
During document retrieval, a candidate document is scored by considering its relationship to the question and other documents.
During answer reranking, a candidate answer is reranked using not only its own context but also the clues from other documents.
arXiv Detail & Related papers (2020-06-09T13:28:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.