Related papers: OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios

OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios

URL: http://arxiv.org/abs/2501.01384v1
Date: Thu, 02 Jan 2025 17:58:23 GMT
Title: OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios
Authors: Xize Cheng, Dongjie Fu, Xiaoda Yang, Minghui Fang, Ruofan Hu, Jingyu Lu, Bai Jionghao, Zehan Wang, Shengpeng Ji, Rongjie Huang, Linjun Li, Yu Chen, Tao Jin, Zhou Zhao,
Abstract summary: We propose leveraging synthetic data to enhance the dialogue models across diverse scenarios.<n>We introduce ShareChatX, the first comprehensive, large-scale dataset for spoken dialogue that spans diverse scenarios.<n>We also explore critical aspects of training dialogue systems using synthetic data.
Score: 45.78414948567598
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: With the rapid development of large language models, researchers have created increasingly advanced spoken dialogue systems that can naturally converse with humans. However, these systems still struggle to handle the full complexity of real-world conversations, including audio events, musical contexts, and emotional expressions, mainly because current dialogue datasets are constrained in both scale and scenario diversity. In this paper, we propose leveraging synthetic data to enhance the dialogue models across diverse scenarios. We introduce ShareChatX, the first comprehensive, large-scale dataset for spoken dialogue that spans diverse scenarios. Based on this dataset, we introduce OmniChat, a multi-turn dialogue system with a heterogeneous feature fusion module, designed to optimize feature selection in different dialogue contexts. In addition, we explored critical aspects of training dialogue systems using synthetic data. Through comprehensive experimentation, we determined the ideal balance between synthetic and real data, achieving state-of-the-art results on the real-world dialogue dataset DailyTalk. We also highlight the crucial importance of synthetic data in tackling diverse, complex dialogue scenarios, especially those involving audio and music. For more details, please visit our demo page at \url{https://sharechatx.github.io/}.

Related papers

Aligning Spoken Dialogue Models from User Interactions [55.192134724622235]
We propose a novel preference alignment framework to improve spoken dialogue models on realtime conversations from user interactions.<n>We create a dataset of more than 150,000 preference pairs from raw multi-turn speech conversations annotated with AI feedback.<n>Our findings shed light on the importance of a well-calibrated balance among various dynamics, crucial for natural real-time speech dialogue systems.
arXiv Detail & Related papers (2025-06-26T16:45:20Z)
DialogueAgents: A Hybrid Agent-Based Speech Synthesis Framework for Multi-Party Dialogue [17.397151329196955]
We propose DialogueAgents, a novel hybrid agent-based speech synthesis framework. We contribute MultiTalk, a bilingual, multi-party, multi-turn speech dialogue dataset.
arXiv Detail & Related papers (2025-04-20T04:14:30Z)
WavChat: A Survey of Spoken Dialogue Models [66.82775211793547]
Recent advancements in spoken dialogue models, exemplified by systems like GPT-4o, have captured significant attention in the speech domain. These advanced spoken dialogue models not only comprehend audio, music, and other speech-related features, but also capture stylistic and timbral characteristics in speech. Despite the progress in spoken dialogue systems, there is a lack of comprehensive surveys that systematically organize and analyze these systems.
arXiv Detail & Related papers (2024-11-15T04:16:45Z)
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation [24.68804661538364]
Full spoken dialogue systems significantly mirror human-human interactions. achieving low latency and natural interactions is a significant challenge. End-to-end full-to-end spoken dialogue systems are a promising direction for developing efficient and natural end-to-end systems. Audio samples of dialogues generated by OmniFlatten can be found at this web site.
arXiv Detail & Related papers (2024-10-23T11:58:58Z)
DiaSynth: Synthetic Dialogue Generation Framework for Low Resource Dialogue Applications [18.378069426713]
Existing research is constrained by general or niche datasets that lack sufficient scale for training dialogue systems. We introduce Dia Synth - a synthetic dialogue generation framework capable of generating high-quality, contextually rich dialogues. We perform our experiments by generating synthetic data using different LLMs and few-shot examples from DialogSum and SAMSum.
arXiv Detail & Related papers (2024-09-25T07:03:31Z)
A Framework for Synthetic Audio Conversations Generation using Large Language Models [0.0]
Conversa Synth is a framework designed to generate synthetic conversation audio using large language models (LLMs) with multiple persona settings. The framework first creates diverse and coherent text-based dialogues across various topics, which are then converted into audio using text-to-speech (TTS) systems.
arXiv Detail & Related papers (2024-09-02T05:09:46Z)
Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation [55.043492250775294]
We introduce a novel Face-to-Face spoken dialogue model. It processes audio-visual speech from user input and generates audio-visual speech as the response. We also introduce MultiDialog, the first large-scale multimodal spoken dialogue corpus.
arXiv Detail & Related papers (2024-06-12T04:48:36Z)
PLACES: Prompting Language Models for Social Conversation Synthesis [103.94325597273316]
We use a small set of expert-written conversations as in-context examples to synthesize a social conversation dataset using prompting. We perform several thorough evaluations of our synthetic conversations compared to human-collected conversations.
arXiv Detail & Related papers (2023-02-07T05:48:16Z)
HybriDialogue: An Information-Seeking Dialogue Dataset Grounded on Tabular and Textual Data [87.67278915655712]
We present a new dialogue dataset, HybriDialogue, which consists of crowdsourced natural conversations grounded on both Wikipedia text and tables. The conversations are created through the decomposition of complex multihop questions into simple, realistic multiturn dialogue interactions.
arXiv Detail & Related papers (2022-04-28T00:52:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.