Dialogizer: Context-aware Conversational-QA Dataset Generation from
Textual Sources
- URL: http://arxiv.org/abs/2311.07589v1
- Date: Thu, 9 Nov 2023 06:03:11 GMT
- Title: Dialogizer: Context-aware Conversational-QA Dataset Generation from
Textual Sources
- Authors: Yerin Hwang, Yongil Kim, Hyunkyung Bae, Jeesoo Bang, Hwanhee Lee, and
Kyomin Jung
- Abstract summary: We propose a novel framework called Dialogizer, which has the capability to automatically generate ConvQA datasets with high contextual relevance.
We produce four ConvQA datasets by utilizing documents from multiple domains as the primary source.
- Score: 18.09705075305591
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To address the data scarcity issue in Conversational question answering
(ConvQA), a dialog inpainting method, which utilizes documents to generate
ConvQA datasets, has been proposed. However, the original dialog inpainting
model is trained solely on the dialog reconstruction task, resulting in the
generation of questions with low contextual relevance due to insufficient
learning of question-answer alignment. To overcome this limitation, we propose
a novel framework called Dialogizer, which has the capability to automatically
generate ConvQA datasets with high contextual relevance from textual sources.
The framework incorporates two training tasks: question-answer matching (QAM)
and topic-aware dialog generation (TDG). Moreover, re-ranking is conducted
during the inference phase based on the contextual relevance of the generated
questions. Using our framework, we produce four ConvQA datasets by utilizing
documents from multiple domains as the primary source. Through automatic
evaluation using diverse metrics, as well as human evaluation, we validate that
our proposed framework exhibits the ability to generate datasets of higher
quality compared to the baseline dialog inpainting model.
Related papers
- Synthesizing Conversations from Unlabeled Documents using Automatic Response Segmentation [13.322409682814827]
We tackle the challenge of inadequate and costly training data for conversational question answering systems.
In this paper, we propose a robust dialog synthesising method.
We learn the segmentation of data for the dialog task instead of using segmenting at sentence boundaries.
arXiv Detail & Related papers (2024-06-06T02:52:45Z) - q2d: Turning Questions into Dialogs to Teach Models How to Search [11.421839177607147]
We propose q2d: an automatic data generation pipeline that generates information-seeking dialogs from questions.
Unlike previous approaches which relied on human written dialogs with search queries, our method allows to automatically generate query-based grounded dialogs with better control and scale.
arXiv Detail & Related papers (2023-04-27T16:39:15Z) - FCC: Fusing Conversation History and Candidate Provenance for Contextual
Response Ranking in Dialogue Systems [53.89014188309486]
We present a flexible neural framework that can integrate contextual information from multiple channels.
We evaluate our model on the MSDialog dataset widely used for evaluating conversational response ranking tasks.
arXiv Detail & Related papers (2023-03-31T23:58:28Z) - CGoDial: A Large-Scale Benchmark for Chinese Goal-oriented Dialog
Evaluation [75.60156479374416]
CGoDial is a new challenging and comprehensive Chinese benchmark for Goal-oriented Dialog evaluation.
It contains 96,763 dialog sessions and 574,949 dialog turns totally, covering three datasets with different knowledge sources.
To bridge the gap between academic benchmarks and spoken dialog scenarios, we either collect data from real conversations or add spoken features to existing datasets via crowd-sourcing.
arXiv Detail & Related papers (2022-11-21T16:21:41Z) - Manual-Guided Dialogue for Flexible Conversational Agents [84.46598430403886]
How to build and use dialogue data efficiently, and how to deploy models in different domains at scale can be critical issues in building a task-oriented dialogue system.
We propose a novel manual-guided dialogue scheme, where the agent learns the tasks from both dialogue and manuals.
Our proposed scheme reduces the dependence of dialogue models on fine-grained domain ontology, and makes them more flexible to adapt to various domains.
arXiv Detail & Related papers (2022-08-16T08:21:12Z) - Dialog Inpainting: Turning Documents into Dialogs [12.131506050808207]
We produce two datasets totalling 19 million diverse information-seeking dialogs.
Human raters judge the answer adequacy and conversationality of WikiDialog to be as good or better than existing manually-collected datasets.
arXiv Detail & Related papers (2022-05-18T16:58:50Z) - End-to-end Spoken Conversational Question Answering: Task, Dataset and
Model [92.18621726802726]
In spoken question answering, the systems are designed to answer questions from contiguous text spans within the related speech transcripts.
We propose a new Spoken Conversational Question Answering task (SCQA), aiming at enabling the systems to model complex dialogue flows.
Our main objective is to build the system to deal with conversational questions based on the audio recordings, and to explore the plausibility of providing more cues from different modalities with systems in information gathering.
arXiv Detail & Related papers (2022-04-29T17:56:59Z) - Reasoning in Dialog: Improving Response Generation by Context Reading
Comprehension [49.92173751203827]
In multi-turn dialog, utterances do not always take the full form of sentences.
We propose to improve the response generation performance by examining the model's ability to answer a reading comprehension question.
arXiv Detail & Related papers (2020-12-14T10:58:01Z) - Towards Data Distillation for End-to-end Spoken Conversational Question
Answering [65.124088336738]
We propose a new Spoken Conversational Question Answering task (SCQA)
SCQA aims at enabling QA systems to model complex dialogues flow given the speech utterances and text corpora.
Our main objective is to build a QA system to deal with conversational questions both in spoken and text forms.
arXiv Detail & Related papers (2020-10-18T05:53:39Z) - Matching Questions and Answers in Dialogues from Online Forums [12.64602629459043]
Matching question-answer relations between two turns in conversations is not only the first step in analyzing dialogue structures, but also valuable for training dialogue systems.
This paper presents a QA matching model considering both distance information and dialogue history by two simultaneous attention mechanisms called mutual attention.
arXiv Detail & Related papers (2020-05-19T08:18:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.