DeepDialogue: A Multi-Turn Emotionally-Rich Spoken Dialogue Dataset
- URL: http://arxiv.org/abs/2505.19978v1
- Date: Mon, 26 May 2025 13:37:10 GMT
- Title: DeepDialogue: A Multi-Turn Emotionally-Rich Spoken Dialogue Dataset
- Authors: Alkis Koudounas, Moreno La Quatra, Elena Baralis,
- Abstract summary: DeepDialogue is a large-scale multimodal dataset containing 40,150 high-quality multi-turn dialogues.<n>Our approach pairs 9 different language models to generate 65,600 initial conversations.<n>A key contribution is its speech component, where we synthesize emotion-consistent voices for all 40,150 dialogues.
- Score: 10.007636884318801
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recent advances in conversational AI have demonstrated impressive capabilities in single-turn responses, yet multi-turn dialogues remain challenging for even the most sophisticated language models. Current dialogue datasets are limited in their emotional range, domain diversity, turn depth, and are predominantly text-only, hindering progress in developing more human-like conversational systems across modalities. To address these limitations, we present DeepDialogue, a large-scale multimodal dataset containing 40,150 high-quality multi-turn dialogues spanning 41 domains and incorporating 20 distinct emotions with coherent emotional progressions. Our approach pairs 9 different language models (4B-72B parameters) to generate 65,600 initial conversations, which we then evaluate through a combination of human annotation and LLM-based quality filtering. The resulting dataset reveals fundamental insights: smaller models fail to maintain coherence beyond 6 dialogue turns; concrete domains (e.g., "cars," "travel") yield more meaningful conversations than abstract ones (e.g., "philosophy"); and cross-model interactions produce more coherent dialogues than same-model conversations. A key contribution of DeepDialogue is its speech component, where we synthesize emotion-consistent voices for all 40,150 dialogues, creating the first large-scale open-source multimodal dialogue dataset that faithfully preserves emotional context across multi-turn conversations.
Related papers
- Aligning Spoken Dialogue Models from User Interactions [55.192134724622235]
We propose a novel preference alignment framework to improve spoken dialogue models on realtime conversations from user interactions.<n>We create a dataset of more than 150,000 preference pairs from raw multi-turn speech conversations annotated with AI feedback.<n>Our findings shed light on the importance of a well-calibrated balance among various dynamics, crucial for natural real-time speech dialogue systems.
arXiv Detail & Related papers (2025-06-26T16:45:20Z) - KwaiChat: A Large-Scale Video-Driven Multilingual Mixed-Type Dialogue Corpus [69.46707346122113]
We propose a novel task and create a human-to-human video-driven multilingual mixed-type dialogue corpus.<n>The KwaiChat corpus contains a total of 93,209 videos and 246,080 dialogues, across 4 dialogue types, 30 domains, 4 languages, and 13 topics.<n>An analysis of 7 distinct LLMs on KwaiChat reveals that GPT-4o achieves the best performance but still cannot perform well in this situation.
arXiv Detail & Related papers (2025-03-10T04:05:38Z) - A Static and Dynamic Attention Framework for Multi Turn Dialogue Generation [37.79563028123686]
In open domain multi turn dialogue generation, it is essential to modeling the contextual semantics of the dialogue history.
Previous research had verified the effectiveness of the hierarchical recurrent encoder-decoder framework on open domain multi turn dialogue generation.
We propose a static and dynamic attention-based approach to model the dialogue history and then generate open domain multi turn dialogue responses.
arXiv Detail & Related papers (2024-10-28T06:05:34Z) - Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation [55.043492250775294]
We introduce a novel Face-to-Face spoken dialogue model.
It processes audio-visual speech from user input and generates audio-visual speech as the response.
We also introduce MultiDialog, the first large-scale multimodal spoken dialogue corpus.
arXiv Detail & Related papers (2024-06-12T04:48:36Z) - DialoGPS: Dialogue Path Sampling in Continuous Semantic Space for Data
Augmentation in Multi-Turn Conversations [18.98951277038404]
In open-domain dialogue generation tasks, contexts and responses in most datasets are one-to-one mapped.
We propose DialoGue Path Sampling (DialoGPS) in continuous semantic space, the first many-to-many augmentation method for multi-turn dialogues.
arXiv Detail & Related papers (2023-06-29T08:12:47Z) - SpokenWOZ: A Large-Scale Speech-Text Benchmark for Spoken Task-Oriented
Dialogue Agents [72.42049370297849]
SpokenWOZ is a large-scale speech-text dataset for spoken TOD.
Cross-turn slot and reasoning slot detection are new challenges for SpokenWOZ.
arXiv Detail & Related papers (2023-05-22T13:47:51Z) - M3ED: Multi-modal Multi-scene Multi-label Emotional Dialogue Database [139.08528216461502]
We propose a Multi-modal Multi-scene Multi-label Emotional Dialogue dataset, M3ED.
M3ED contains 990 dyadic emotional dialogues from 56 different TV series, a total of 9,082 turns and 24,449 utterances.
To the best of our knowledge, M3ED is the first multimodal emotional dialogue dataset in Chinese.
arXiv Detail & Related papers (2022-05-09T06:52:51Z) - Generating Empathetic Responses with a Large Scale Dialog Dataset [0.76146285961466]
Existing models either directly incorporate pre-defined emotion information to guide the response generation, or use deterministic rules to decide the response emotion.
We show how to build a multi-turn empathetic dialog model that performs well compared to its baselines over 6,000 human evaluated instances.
arXiv Detail & Related papers (2021-05-14T13:45:40Z) - Rethinking Dialogue State Tracking with Reasoning [76.0991910623001]
This paper proposes to track dialogue states gradually with reasoning over dialogue turns with the help of the back-end data.
Empirical results demonstrate that our method significantly outperforms the state-of-the-art methods by 38.6% in terms of joint belief accuracy for MultiWOZ 2.1.
arXiv Detail & Related papers (2020-05-27T02:05:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.