Synthesizing Conversations from Unlabeled Documents using Automatic Response Segmentation
- URL: http://arxiv.org/abs/2406.03703v1
- Date: Thu, 6 Jun 2024 02:52:45 GMT
- Title: Synthesizing Conversations from Unlabeled Documents using Automatic Response Segmentation
- Authors: Fanyou Wu, Weijie Xu, Chandan K. Reddy, Srinivasan H. Sengamedu,
- Abstract summary: We tackle the challenge of inadequate and costly training data for conversational question answering systems.
In this paper, we propose a robust dialog synthesising method.
We learn the segmentation of data for the dialog task instead of using segmenting at sentence boundaries.
- Score: 13.322409682814827
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this study, we tackle the challenge of inadequate and costly training data that has hindered the development of conversational question answering (ConvQA) systems. Enterprises have a large corpus of diverse internal documents. Instead of relying on a searching engine, a more compelling approach for people to comprehend these documents is to create a dialogue system. In this paper, we propose a robust dialog synthesising method. We learn the segmentation of data for the dialog task instead of using segmenting at sentence boundaries. The synthetic dataset generated by our proposed method achieves superior quality when compared to WikiDialog, as assessed through machine and human evaluations. By employing our inpainted data for ConvQA retrieval system pre-training, we observed a notable improvement in performance across OR-QuAC benchmarks.
Related papers
- SuperDialseg: A Large-scale Dataset for Supervised Dialogue Segmentation [55.82577086422923]
We provide a feasible definition of dialogue segmentation points with the help of document-grounded dialogues.
We release a large-scale supervised dataset called SuperDialseg, containing 9,478 dialogues.
We also provide a benchmark including 18 models across five categories for the dialogue segmentation task.
arXiv Detail & Related papers (2023-05-15T06:08:01Z) - FCC: Fusing Conversation History and Candidate Provenance for Contextual
Response Ranking in Dialogue Systems [53.89014188309486]
We present a flexible neural framework that can integrate contextual information from multiple channels.
We evaluate our model on the MSDialog dataset widely used for evaluating conversational response ranking tasks.
arXiv Detail & Related papers (2023-03-31T23:58:28Z) - GODEL: Large-Scale Pre-Training for Goal-Directed Dialog [119.1397031992088]
We introduce GODEL, a large pre-trained language model for dialog.
We show that GODEL outperforms state-of-the-art pre-trained dialog models in few-shot fine-tuning setups.
A novel feature of our evaluation methodology is the introduction of a notion of utility that assesses the usefulness of responses.
arXiv Detail & Related papers (2022-06-22T18:19:32Z) - Dialog Inpainting: Turning Documents into Dialogs [12.131506050808207]
We produce two datasets totalling 19 million diverse information-seeking dialogs.
Human raters judge the answer adequacy and conversationality of WikiDialog to be as good or better than existing manually-collected datasets.
arXiv Detail & Related papers (2022-05-18T16:58:50Z) - End-to-end Spoken Conversational Question Answering: Task, Dataset and
Model [92.18621726802726]
In spoken question answering, the systems are designed to answer questions from contiguous text spans within the related speech transcripts.
We propose a new Spoken Conversational Question Answering task (SCQA), aiming at enabling the systems to model complex dialogue flows.
Our main objective is to build the system to deal with conversational questions based on the audio recordings, and to explore the plausibility of providing more cues from different modalities with systems in information gathering.
arXiv Detail & Related papers (2022-04-29T17:56:59Z) - DG2: Data Augmentation Through Document Grounded Dialogue Generation [41.81030088619399]
We propose an automatic data augmentation technique grounded on documents through a generative dialogue model.
When supplementing the original dataset, our method achieves significant improvement over traditional data augmentation methods.
arXiv Detail & Related papers (2021-12-15T18:50:14Z) - Reasoning in Dialog: Improving Response Generation by Context Reading
Comprehension [49.92173751203827]
In multi-turn dialog, utterances do not always take the full form of sentences.
We propose to improve the response generation performance by examining the model's ability to answer a reading comprehension question.
arXiv Detail & Related papers (2020-12-14T10:58:01Z) - Towards Data Distillation for End-to-end Spoken Conversational Question
Answering [65.124088336738]
We propose a new Spoken Conversational Question Answering task (SCQA)
SCQA aims at enabling QA systems to model complex dialogues flow given the speech utterances and text corpora.
Our main objective is to build a QA system to deal with conversational questions both in spoken and text forms.
arXiv Detail & Related papers (2020-10-18T05:53:39Z) - A Compare Aggregate Transformer for Understanding Document-grounded
Dialogue [27.04964963480175]
We propose a Compare Aggregate Transformer (CAT) to jointly denoise the dialogue context and aggregate the document information for response generation.
Experimental results on the CMUDoG dataset show that the proposed CAT model outperforms the state-of-the-art approach and strong baselines.
arXiv Detail & Related papers (2020-10-01T03:44:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.