doc2dial: A Goal-Oriented Document-Grounded Dialogue Dataset
- URL: http://arxiv.org/abs/2011.06623v2
- Date: Wed, 18 Nov 2020 22:42:12 GMT
- Title: doc2dial: A Goal-Oriented Document-Grounded Dialogue Dataset
- Authors: Song Feng, Hui Wan, Chulaka Gunasekara, Siva Sankalp Patel, Sachindra
Joshi, Luis A. Lastras
- Abstract summary: doc2dial is a new dataset of goal-oriented dialogues grounded in documents.
We first construct dialogue flows based on the content elements that corresponds to higher-level relations across text sections.
We present these dialogue flows to crowd contributors to create conversational utterances.
- Score: 24.040517978408484
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce doc2dial, a new dataset of goal-oriented dialogues that are
grounded in the associated documents. Inspired by how the authors compose
documents for guiding end users, we first construct dialogue flows based on the
content elements that corresponds to higher-level relations across text
sections as well as lower-level relations between discourse units within a
section. Then we present these dialogue flows to crowd contributors to create
conversational utterances. The dataset includes about 4800 annotated
conversations with an average of 14 turns that are grounded in over 480
documents from four domains. Compared to the prior document-grounded dialogue
datasets, this dataset covers a variety of dialogue scenes in
information-seeking conversations. For evaluating the versatility of the
dataset, we introduce multiple dialogue modeling tasks and present baseline
approaches.
Related papers
- Multi-turn Dialogue Comprehension from a Topic-aware Perspective [70.37126956655985]
This paper proposes to model multi-turn dialogues from a topic-aware perspective.
We use a dialogue segmentation algorithm to split a dialogue passage into topic-concentrated fragments in an unsupervised way.
We also present a novel model, Topic-Aware Dual-Attention Matching (TADAM) Network, which takes topic segments as processing elements.
arXiv Detail & Related papers (2023-09-18T11:03:55Z) - DialogStudio: Towards Richest and Most Diverse Unified Dataset
Collection for Conversational AI [92.29874802394167]
DialogStudio is the largest and most diverse collection of dialogue datasets.
Our collection encompasses data from open-domain dialogues, task-oriented dialogues, natural language understanding, conversational recommendation, dialogue summarization, and knowledge-grounded dialogues.
arXiv Detail & Related papers (2023-07-19T17:57:53Z) - SuperDialseg: A Large-scale Dataset for Supervised Dialogue Segmentation [55.82577086422923]
We provide a feasible definition of dialogue segmentation points with the help of document-grounded dialogues.
We release a large-scale supervised dataset called SuperDialseg, containing 9,478 dialogues.
We also provide a benchmark including 18 models across five categories for the dialogue segmentation task.
arXiv Detail & Related papers (2023-05-15T06:08:01Z) - Manual-Guided Dialogue for Flexible Conversational Agents [84.46598430403886]
How to build and use dialogue data efficiently, and how to deploy models in different domains at scale can be critical issues in building a task-oriented dialogue system.
We propose a novel manual-guided dialogue scheme, where the agent learns the tasks from both dialogue and manuals.
Our proposed scheme reduces the dependence of dialogue models on fine-grained domain ontology, and makes them more flexible to adapt to various domains.
arXiv Detail & Related papers (2022-08-16T08:21:12Z) - DailyTalk: Spoken Dialogue Dataset for Conversational Text-to-Speech [4.339031624083067]
We introduce DailyTalk, a high-quality conversational speech dataset designed for Text-to-Speech.
We sampled, modified, and recorded 2,541 dialogues from the open-domain dialogue dataset DailyDialog.
We extend prior work as our baseline, where a non-autoregressive TTS is conditioned on historical information in a dialog.
arXiv Detail & Related papers (2022-07-03T15:07:41Z) - HybriDialogue: An Information-Seeking Dialogue Dataset Grounded on
Tabular and Textual Data [87.67278915655712]
We present a new dialogue dataset, HybriDialogue, which consists of crowdsourced natural conversations grounded on both Wikipedia text and tables.
The conversations are created through the decomposition of complex multihop questions into simple, realistic multiturn dialogue interactions.
arXiv Detail & Related papers (2022-04-28T00:52:16Z) - DG2: Data Augmentation Through Document Grounded Dialogue Generation [41.81030088619399]
We propose an automatic data augmentation technique grounded on documents through a generative dialogue model.
When supplementing the original dataset, our method achieves significant improvement over traditional data augmentation methods.
arXiv Detail & Related papers (2021-12-15T18:50:14Z) - MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents [14.807409907211452]
We propose MultiDoc2Dial, a new task and dataset on modeling goal-oriented dialogues grounded in multiple documents.
We introduce a new dataset that contains dialogues grounded in multiple documents from four different domains.
arXiv Detail & Related papers (2021-09-26T13:12:05Z) - RiSAWOZ: A Large-Scale Multi-Domain Wizard-of-Oz Dataset with Rich
Semantic Annotations for Task-Oriented Dialogue Modeling [35.75880078666584]
RiSAWOZ is a large-scale multi-domain Chinese Wizard-of-Oz dataset with Rich Semantic s.
It contains 11.2K human-to-human (H2H) multi-turn semantically annotated dialogues, with more than 150K utterances spanning over 12 domains.
arXiv Detail & Related papers (2020-10-17T08:18:59Z) - Rethinking Dialogue State Tracking with Reasoning [76.0991910623001]
This paper proposes to track dialogue states gradually with reasoning over dialogue turns with the help of the back-end data.
Empirical results demonstrate that our method significantly outperforms the state-of-the-art methods by 38.6% in terms of joint belief accuracy for MultiWOZ 2.1.
arXiv Detail & Related papers (2020-05-27T02:05:33Z) - Interview: A Large-Scale Open-Source Corpus of Media Dialog [11.28504775964698]
We introduce 'Interview': a large-scale (105K conversations) media dialog dataset collected from news interview transcripts.
Compared to existing large-scale proxies for conversational data, language models trained on our dataset exhibit better zero-shot out-of-domain performance.
'Interview' contains speaker role annotations for each turn, facilitating the development of engaging, responsive dialog systems.
arXiv Detail & Related papers (2020-04-07T02:44:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.