QAConv: Question Answering on Informative Conversations
- URL: http://arxiv.org/abs/2105.06912v1
- Date: Fri, 14 May 2021 15:53:05 GMT
- Title: QAConv: Question Answering on Informative Conversations
- Authors: Chien-Sheng Wu, Andrea Madotto, Wenhao Liu, Pascale Fung, Caiming
Xiong
- Abstract summary: We focus on informative conversations including business emails, panel discussions, and work channels.
In total, we collect 34,204 QA pairs, including span-based, free-form, and unanswerable questions.
- Score: 85.2923607672282
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces QAConv, a new question answering (QA) dataset that uses
conversations as a knowledge source. We focus on informative conversations
including business emails, panel discussions, and work channels. Unlike
open-domain and task-oriented dialogues, these conversations are usually long,
complex, asynchronous, and involve strong domain knowledge. In total, we
collect 34,204 QA pairs, including span-based, free-form, and unanswerable
questions, from 10,259 selected conversations with both human-written and
machine-generated questions. We segment long conversations into chunks, and use
a question generator and dialogue summarizer as auxiliary tools to collect
multi-hop questions. The dataset has two testing scenarios, chunk mode and full
mode, depending on whether the grounded chunk is provided or retrieved from a
large conversational pool. Experimental results show that state-of-the-art QA
systems trained on existing QA datasets have limited zero-shot ability and tend
to predict our questions as unanswerable. Fine-tuning such systems on our
corpus can achieve significant improvement up to 23.6% and 13.6% in both chunk
mode and full mode, respectively.
Related papers
- PCoQA: Persian Conversational Question Answering Dataset [12.07607688189035]
The PCoQA dataset is a resource comprising information-seeking dialogs encompassing a total of 9,026 contextually-driven questions.
PCoQA is designed to present novel challenges compared to previous question answering datasets.
This paper not only presents the comprehensive PCoQA dataset but also reports the performance of various benchmark models.
arXiv Detail & Related papers (2023-12-07T15:29:34Z) - Conversational QA Dataset Generation with Answer Revision [2.5838973036257458]
We introduce a novel framework that extracts question-worthy phrases from a passage and then generates corresponding questions considering previous conversations.
Our framework revises the extracted answers after generating questions so that answers exactly match paired questions.
arXiv Detail & Related papers (2022-09-23T04:05:38Z) - End-to-end Spoken Conversational Question Answering: Task, Dataset and
Model [92.18621726802726]
In spoken question answering, the systems are designed to answer questions from contiguous text spans within the related speech transcripts.
We propose a new Spoken Conversational Question Answering task (SCQA), aiming at enabling the systems to model complex dialogue flows.
Our main objective is to build the system to deal with conversational questions based on the audio recordings, and to explore the plausibility of providing more cues from different modalities with systems in information gathering.
arXiv Detail & Related papers (2022-04-29T17:56:59Z) - Multifaceted Improvements for Conversational Open-Domain Question
Answering [54.913313912927045]
We propose a framework with Multifaceted Improvements for Conversational open-domain Question Answering (MICQA)
Firstly, the proposed KL-divergence based regularization is able to lead to a better question understanding for retrieval and answer reading.
Second, the added post-ranker module can push more relevant passages to the top placements and be selected for reader with a two-aspect constrains.
Third, the well designed curriculum learning strategy effectively narrows the gap between the golden passage settings of training and inference, and encourages the reader to find true answer without the golden passage assistance.
arXiv Detail & Related papers (2022-04-01T07:54:27Z) - ConditionalQA: A Complex Reading Comprehension Dataset with Conditional
Answers [93.55268936974971]
We describe a Question Answering dataset that contains complex questions with conditional answers.
We call this dataset ConditionalQA.
We show that ConditionalQA is challenging for many of the existing QA models, especially in selecting answer conditions.
arXiv Detail & Related papers (2021-10-13T17:16:46Z) - TopiOCQA: Open-domain Conversational Question Answeringwith Topic
Switching [11.717296856448566]
We introduce TopiOCQA, an open-domain conversational dataset with topic switches on Wikipedia.
TopiOCQA contains 3,920 conversations with information-seeking questions and free-form answers.
We evaluate several baselines, by combining state-of-the-art document retrieval methods with neural reader models.
arXiv Detail & Related papers (2021-10-02T09:53:48Z) - Towards Data Distillation for End-to-end Spoken Conversational Question
Answering [65.124088336738]
We propose a new Spoken Conversational Question Answering task (SCQA)
SCQA aims at enabling QA systems to model complex dialogues flow given the speech utterances and text corpora.
Our main objective is to build a QA system to deal with conversational questions both in spoken and text forms.
arXiv Detail & Related papers (2020-10-18T05:53:39Z) - DoQA -- Accessing Domain-Specific FAQs via Conversational QA [25.37327993590628]
We present DoQA, a dataset with 2,437 dialogues and 10,917 QA pairs.
The dialogues are collected from three Stack Exchange sites using the Wizard of Oz method with crowdsourcing.
arXiv Detail & Related papers (2020-05-04T08:58:54Z) - Conversations with Search Engines: SERP-based Conversational Response
Generation [77.1381159789032]
We create a suitable dataset, the Search as a Conversation (SaaC) dataset, for the development of pipelines for conversations with search engines.
We also develop a state-of-the-art pipeline for conversations with search engines, the Conversations with Search Engines (CaSE) using this dataset.
CaSE enhances the state-of-the-art by introducing a supporting token identification module and aprior-aware pointer generator.
arXiv Detail & Related papers (2020-04-29T13:07:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.