Open-Domain Question Answering Goes Conversational via Question
Rewriting
- URL: http://arxiv.org/abs/2010.04898v3
- Date: Wed, 14 Apr 2021 19:09:19 GMT
- Title: Open-Domain Question Answering Goes Conversational via Question
Rewriting
- Authors: Raviteja Anantha, Svitlana Vakulenko, Zhucheng Tu, Shayne Longpre,
Stephen Pulman, Srinivas Chappidi
- Abstract summary: We introduce a new dataset for Question Rewriting in Conversational Context (QReCC), which contains 14K conversations with 80K question-answer pairs.
The task in QReCC is to find answers to conversational questions within a collection of 10M web pages.
Our results set the first baseline for the QReCC dataset with F1 of 19.10, compared to the human upper bound of 75.45, indicating the difficulty of the setup and a large room for improvement.
- Score: 15.174807142080192
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a new dataset for Question Rewriting in Conversational Context
(QReCC), which contains 14K conversations with 80K question-answer pairs. The
task in QReCC is to find answers to conversational questions within a
collection of 10M web pages (split into 54M passages). Answers to questions in
the same conversation may be distributed across several web pages. QReCC
provides annotations that allow us to train and evaluate individual subtasks of
question rewriting, passage retrieval and reading comprehension required for
the end-to-end conversational question answering (QA) task. We report the
effectiveness of a strong baseline approach that combines the state-of-the-art
model for question rewriting, and competitive models for open-domain QA. Our
results set the first baseline for the QReCC dataset with F1 of 19.10, compared
to the human upper bound of 75.45, indicating the difficulty of the setup and a
large room for improvement.
Related papers
- MeeQA: Natural Questions in Meeting Transcripts [3.383670923637875]
We present MeeQA, a dataset for natural-language question answering over meeting transcripts.
The dataset contains 48K question-answer pairs, extracted from 422 meeting transcripts.
arXiv Detail & Related papers (2023-05-15T10:02:47Z) - Conversational QA Dataset Generation with Answer Revision [2.5838973036257458]
We introduce a novel framework that extracts question-worthy phrases from a passage and then generates corresponding questions considering previous conversations.
Our framework revises the extracted answers after generating questions so that answers exactly match paired questions.
arXiv Detail & Related papers (2022-09-23T04:05:38Z) - Multifaceted Improvements for Conversational Open-Domain Question
Answering [54.913313912927045]
We propose a framework with Multifaceted Improvements for Conversational open-domain Question Answering (MICQA)
Firstly, the proposed KL-divergence based regularization is able to lead to a better question understanding for retrieval and answer reading.
Second, the added post-ranker module can push more relevant passages to the top placements and be selected for reader with a two-aspect constrains.
Third, the well designed curriculum learning strategy effectively narrows the gap between the golden passage settings of training and inference, and encourages the reader to find true answer without the golden passage assistance.
arXiv Detail & Related papers (2022-04-01T07:54:27Z) - QAConv: Question Answering on Informative Conversations [85.2923607672282]
We focus on informative conversations including business emails, panel discussions, and work channels.
In total, we collect 34,204 QA pairs, including span-based, free-form, and unanswerable questions.
arXiv Detail & Related papers (2021-05-14T15:53:05Z) - A Graph-guided Multi-round Retrieval Method for Conversational
Open-domain Question Answering [52.041815783025186]
We propose a novel graph-guided retrieval method to model the relations among answers across conversation turns.
We also propose to incorporate the multi-round relevance feedback technique to explore the impact of the retrieval context on current question understanding.
arXiv Detail & Related papers (2021-04-17T04:39:41Z) - ParaQA: A Question Answering Dataset with Paraphrase Responses for
Single-Turn Conversation [5.087932295628364]
ParaQA is a dataset with multiple paraphrased responses for single-turn conversation over knowledge graphs (KG)
The dataset was created using a semi-automated framework for generating diverse paraphrasing of the answers using techniques such as back-translation.
arXiv Detail & Related papers (2021-03-13T18:53:07Z) - Open Question Answering over Tables and Text [55.8412170633547]
In open question answering (QA), the answer to a question is produced by retrieving and then analyzing documents that might contain answers to the question.
Most open QA systems have considered only retrieving information from unstructured text.
We present a new large-scale dataset Open Table-and-Text Question Answering (OTT-QA) to evaluate performance on this task.
arXiv Detail & Related papers (2020-10-20T16:48:14Z) - Towards Data Distillation for End-to-end Spoken Conversational Question
Answering [65.124088336738]
We propose a new Spoken Conversational Question Answering task (SCQA)
SCQA aims at enabling QA systems to model complex dialogues flow given the speech utterances and text corpora.
Our main objective is to build a QA system to deal with conversational questions both in spoken and text forms.
arXiv Detail & Related papers (2020-10-18T05:53:39Z) - Answering Any-hop Open-domain Questions with Iterative Document
Reranking [62.76025579681472]
We propose a unified QA framework to answer any-hop open-domain questions.
Our method consistently achieves performance comparable to or better than the state-of-the-art on both single-hop and multi-hop open-domain QA datasets.
arXiv Detail & Related papers (2020-09-16T04:31:38Z) - Question Rewriting for Conversational Question Answering [15.355557454305776]
We introduce a conversational QA architecture that sets the new state of the art on the TREC CAsT 2019 passage retrieval dataset.
We show that the same QR model improves QA performance on the QuAC dataset with respect to answer span extraction.
Our evaluation results indicate that the QR model achieves near human-level performance on both datasets.
arXiv Detail & Related papers (2020-04-30T09:27:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.