Retrieving and Reading: A Comprehensive Survey on Open-domain Question
Answering
- URL: http://arxiv.org/abs/2101.00774v1
- Date: Mon, 4 Jan 2021 04:47:46 GMT
- Title: Retrieving and Reading: A Comprehensive Survey on Open-domain Question
Answering
- Authors: Fengbin Zhu, Wenqiang Lei, Chao Wang, Jianming Zheng, Soujanya Poria,
Tat-Seng Chua
- Abstract summary: We review the latest research trends in OpenQA, with particular attention to systems that incorporate neural MRC techniques.
We introduce modern OpenQA architecture named Retriever-Reader'' and analyze the various systems that follow this architecture.
We then discuss key challenges to developing OpenQA systems and offer an analysis of benchmarks that are commonly used.
- Score: 62.88322725956294
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Open-domain Question Answering (OpenQA) is an important task in Natural
Language Processing (NLP), which aims to answer a question in the form of
natural language based on large-scale unstructured documents. Recently, there
has been a surge in the amount of research literature on OpenQA, particularly
on techniques that integrate with neural Machine Reading Comprehension (MRC).
While these research works have advanced performance to new heights on
benchmark datasets, they have been rarely covered in existing surveys on QA
systems. In this work, we review the latest research trends in OpenQA, with
particular attention to systems that incorporate neural MRC techniques.
Specifically, we begin with revisiting the origin and development of OpenQA
systems. We then introduce modern OpenQA architecture named
``Retriever-Reader'' and analyze the various systems that follow this
architecture as well as the specific techniques adopted in each of the
components. We then discuss key challenges to developing OpenQA systems and
offer an analysis of benchmarks that are commonly used. We hope our work would
enable researchers to be informed of the recent advancement and also the open
challenges in OpenQA research, so as to stimulate further progress in this
field.
Related papers
- Towards Better Generalization in Open-Domain Question Answering by Mitigating Context Memorization [67.92796510359595]
Open-domain Question Answering (OpenQA) aims at answering factual questions with an external large-scale knowledge corpus.
It is still unclear how well an OpenQA model can transfer to completely new knowledge domains.
We introduce Corpus-Invariant Tuning (CIT), a simple but effective training strategy, to mitigate the knowledge over-memorization.
arXiv Detail & Related papers (2024-04-02T05:44:50Z) - Around the GLOBE: Numerical Aggregation Question-Answering on
Heterogeneous Genealogical Knowledge Graphs with Deep Neural Networks [0.934612743192798]
We present a new end-to-end methodology for numerical aggregation QA for genealogical trees.
The proposed architecture, GLOBE, outperforms the state-of-the-art models and pipelines by achieving 87% accuracy for this task.
This study may have practical implications for genealogical information centers and museums.
arXiv Detail & Related papers (2023-07-30T12:09:00Z) - PrimeQA: The Prime Repository for State-of-the-Art Multilingual Question
Answering Research and Development [24.022050096797606]
PRIMEQA is a one-stop QA repository with an aim to democratize QA re-search and facilitate easy replication of state-of-the-art (SOTA) QA methods.
It supports core QA functionalities like retrieval and reading comprehension as well as auxiliary capabilities such as question generation.
It has been designed as an end-to-end toolkit for various use cases: building front-end applications, replicating SOTA methods on pub-lic benchmarks, and expanding pre-existing methods.
arXiv Detail & Related papers (2023-01-23T20:43:26Z) - XLMRQA: Open-Domain Question Answering on Vietnamese Wikipedia-based
Textual Knowledge Source [2.348805691644086]
This paper presents XLMRQA, the first Vietnamese QA system using a supervised transformer-based reader on the Wikipedia-based textual knowledge source.
From the results obtained on the three systems, we analyze the influence of question types on the performance of the QA systems.
arXiv Detail & Related papers (2022-04-14T14:54:33Z) - Multifaceted Improvements for Conversational Open-Domain Question
Answering [54.913313912927045]
We propose a framework with Multifaceted Improvements for Conversational open-domain Question Answering (MICQA)
Firstly, the proposed KL-divergence based regularization is able to lead to a better question understanding for retrieval and answer reading.
Second, the added post-ranker module can push more relevant passages to the top placements and be selected for reader with a two-aspect constrains.
Third, the well designed curriculum learning strategy effectively narrows the gap between the golden passage settings of training and inference, and encourages the reader to find true answer without the golden passage assistance.
arXiv Detail & Related papers (2022-04-01T07:54:27Z) - OpenQA: Hybrid QA System Relying on Structured Knowledge Base as well as
Non-structured Data [15.585969737147892]
We propose an intelligent question-answering system based on structured KB and unstructured data, called OpenQA.
We integrate KBQA structured question answering based on semantic parsing and deep representation learning, and two-stage unstructured question answering based on retrieval and neural machine reading comprehension into OpenQA.
arXiv Detail & Related papers (2021-12-31T09:15:39Z) - Open Domain Question Answering over Virtual Documents: A Unified
Approach for Data and Text [62.489652395307914]
We use the data-to-text method as a means for encoding structured knowledge for knowledge-intensive applications, i.e. open-domain question answering (QA)
Specifically, we propose a verbalizer-retriever-reader framework for open-domain QA over data and text where verbalized tables from Wikipedia and triples from Wikidata are used as augmented knowledge sources.
We show that our Unified Data and Text QA, UDT-QA, can effectively benefit from the expanded knowledge index, leading to large gains over text-only baselines.
arXiv Detail & Related papers (2021-10-16T00:11:21Z) - Narrative Question Answering with Cutting-Edge Open-Domain QA
Techniques: A Comprehensive Study [45.9120218818558]
We benchmark the research on the NarrativeQA dataset with experiments with cutting-edge ODQA techniques.
This quantifies the challenges Book QA poses, as well as advances the published state-of-the-art with a $sim$7% absolute improvement on Rouge-L.
Our findings indicate that the event-centric questions dominate this task, which exemplifies the inability of existing QA models to handle event-oriented scenarios.
arXiv Detail & Related papers (2021-06-07T17:46:09Z) - Conversational Question Answering: A Survey [18.447856993867788]
This survey is an effort to present a comprehensive review of the state-of-the-art research trends of Conversational Question Answering (CQA)
Our findings show that there has been a trend shift from single-turn to multi-turn QA which empowers the field of Conversational AI from different perspectives.
arXiv Detail & Related papers (2021-06-02T01:06:34Z) - A Survey on Complex Question Answering over Knowledge Base: Recent
Advances and Challenges [71.4531144086568]
Question Answering (QA) over Knowledge Base (KB) aims to automatically answer natural language questions.
Researchers have shifted their attention from simple questions to complex questions, which require more KB triples and constraint inference.
arXiv Detail & Related papers (2020-07-26T07:13:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.