DailyTalk: Spoken Dialogue Dataset for Conversational Text-to-Speech
- URL: http://arxiv.org/abs/2207.01063v1
- Date: Sun, 3 Jul 2022 15:07:41 GMT
- Title: DailyTalk: Spoken Dialogue Dataset for Conversational Text-to-Speech
- Authors: Keon Lee, Kyumin Park, Daeyoung Kim
- Abstract summary: We introduce DailyTalk, a high-quality conversational speech dataset designed for Text-to-Speech.
We sampled, modified, and recorded 2,541 dialogues from the open-domain dialogue dataset DailyDialog.
We extend prior work as our baseline, where a non-autoregressive TTS is conditioned on historical information in a dialog.
- Score: 4.339031624083067
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The majority of current TTS datasets, which are collections of individual
utterances, contain few conversational aspects in terms of both style and
metadata. In this paper, we introduce DailyTalk, a high-quality conversational
speech dataset designed for Text-to-Speech. We sampled, modified, and recorded
2,541 dialogues from the open-domain dialogue dataset DailyDialog which are
adequately long to represent context of each dialogue. During the data
construction step, we maintained attributes distribution originally annotated
in DailyDialog to support diverse dialogue in DailyTalk. On top of our dataset,
we extend prior work as our baseline, where a non-autoregressive TTS is
conditioned on historical information in a dialog. We gather metadata so that a
TTS model can learn historical dialog information, the key to generating
context-aware speech. From the baseline experiment results, we show that
DailyTalk can be used to train neural text-to-speech models, and our baseline
can represent contextual information. The DailyTalk dataset and baseline code
are freely available for academic use with CC-BY-SA 4.0 license.
Related papers
- Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation [55.043492250775294]
We introduce a novel Face-to-Face spoken dialogue model.
It processes audio-visual speech from user input and generates audio-visual speech as the response.
We also introduce MultiDialog, the first large-scale multimodal spoken dialogue corpus.
arXiv Detail & Related papers (2024-06-12T04:48:36Z) - DialogStudio: Towards Richest and Most Diverse Unified Dataset
Collection for Conversational AI [92.29874802394167]
DialogStudio is the largest and most diverse collection of dialogue datasets.
Our collection encompasses data from open-domain dialogues, task-oriented dialogues, natural language understanding, conversational recommendation, dialogue summarization, and knowledge-grounded dialogues.
arXiv Detail & Related papers (2023-07-19T17:57:53Z) - SpokenWOZ: A Large-Scale Speech-Text Benchmark for Spoken Task-Oriented
Dialogue Agents [72.42049370297849]
SpokenWOZ is a large-scale speech-text dataset for spoken TOD.
Cross-turn slot and reasoning slot detection are new challenges for SpokenWOZ.
arXiv Detail & Related papers (2023-05-22T13:47:51Z) - FCTalker: Fine and Coarse Grained Context Modeling for Expressive
Conversational Speech Synthesis [75.74906149219817]
Conversational Text-to-Speech (TTS) aims to synthesis an utterance with the right linguistic and affective prosody in a conversational context.
We propose a novel expressive conversational TTS model, as termed FCTalker, that learn the fine and coarse grained context dependency at the same time during speech generation.
arXiv Detail & Related papers (2022-10-27T12:20:20Z) - What Did You Say? Task-Oriented Dialog Datasets Are Not Conversational!? [4.022057598291766]
We outline a taxonomy of conversational and contextual effects, which we use to examine MultiWOZ, SGD and SMCalFlow.
We find that less than 4% of MultiWOZ's turns and 10% of SGD's turns are conversational, while SMCalFlow is not conversational at all in its current release.
arXiv Detail & Related papers (2022-03-07T14:26:23Z) - "How Robust r u?": Evaluating Task-Oriented Dialogue Systems on Spoken
Conversations [87.95711406978157]
This work presents a new benchmark on spoken task-oriented conversations.
We study multi-domain dialogue state tracking and knowledge-grounded dialogue modeling.
Our data set enables speech-based benchmarking of task-oriented dialogue systems.
arXiv Detail & Related papers (2021-09-28T04:51:04Z) - doc2dial: A Goal-Oriented Document-Grounded Dialogue Dataset [24.040517978408484]
doc2dial is a new dataset of goal-oriented dialogues grounded in documents.
We first construct dialogue flows based on the content elements that corresponds to higher-level relations across text sections.
We present these dialogue flows to crowd contributors to create conversational utterances.
arXiv Detail & Related papers (2020-11-12T19:08:44Z) - Pchatbot: A Large-Scale Dataset for Personalized Chatbot [49.16746174238548]
We introduce Pchatbot, a large-scale dialogue dataset that contains two subsets collected from Weibo and Judicial forums respectively.
To adapt the raw dataset to dialogue systems, we elaborately normalize the raw dataset via processes such as anonymization.
The scale of Pchatbot is significantly larger than existing Chinese datasets, which might benefit the data-driven models.
arXiv Detail & Related papers (2020-09-28T12:49:07Z) - Interview: A Large-Scale Open-Source Corpus of Media Dialog [11.28504775964698]
We introduce 'Interview': a large-scale (105K conversations) media dialog dataset collected from news interview transcripts.
Compared to existing large-scale proxies for conversational data, language models trained on our dataset exhibit better zero-shot out-of-domain performance.
'Interview' contains speaker role annotations for each turn, facilitating the development of engaging, responsive dialog systems.
arXiv Detail & Related papers (2020-04-07T02:44:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.