ConvoGen: Enhancing Conversational AI with Synthetic Data: A Multi-Agent Approach
- URL: http://arxiv.org/abs/2503.17460v1
- Date: Fri, 21 Mar 2025 18:14:12 GMT
- Title: ConvoGen: Enhancing Conversational AI with Synthetic Data: A Multi-Agent Approach
- Authors: Reem Gody, Mahmoud Goudy, Ahmed Y. Tawfik,
- Abstract summary: We present ConvoGen, an innovative framework for generating synthetic conversational data using multi-agent systems.<n>The generated data has numerous applications, including training and evaluating conversational AI models.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we present ConvoGen: an innovative framework for generating synthetic conversational data using multi-agent systems. Our method leverages few-shot learning and introduces iterative sampling from a dynamically updated few-shot hub to create diverse and realistic conversational scenarios. The generated data has numerous applications, including training and evaluating conversational AI models, and augmenting existing datasets for tasks like conversational intent classification or conversation summarization. Our experiments demonstrate the effectiveness of this method in producing high-quality diverse synthetic conversational data, highlighting its potential to enhance the development and evaluation of conversational AI systems.
Related papers
- Data-Centric Improvements for Enhancing Multi-Modal Understanding in Spoken Conversation Modeling [13.628984890958314]
We introduce a data-centric customization approach for efficiently enhancing multimodal understanding in conversational speech modeling.<n>Our approach achieves state-of-the-art performance on the Spoken-SQuAD benchmark, using only 10% of the training data with open-weight models.<n>We also introduce ASK-QA, the first dataset for multi-turn spoken dialogue with ambiguous user requests and dynamic evaluation inputs.
arXiv Detail & Related papers (2024-12-20T15:43:09Z) - Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition [48.527630771422935]
We propose a synthetic data generation pipeline for multi-speaker conversational ASR.
We conduct evaluation by fine-tuning the Whisper ASR model for telephone and distant conversational speech settings.
arXiv Detail & Related papers (2024-08-17T14:47:05Z) - A Survey on Recent Advances in Conversational Data Generation [14.237954885530396]
We offer a systematic and comprehensive review of multi-turn conversational data generation.
We focus on three types of dialogue systems: open domain, task-oriented, and information-seeking.
We examine the evaluation metrics and methods for assessing synthetic conversational data.
arXiv Detail & Related papers (2024-05-12T10:11:12Z) - Simulating Task-Oriented Dialogues with State Transition Graphs and Large Language Models [16.94819621353007]
SynTOD is a new synthetic data generation approach for developing end-to-end Task-Oriented Dialogue (TOD) systems.
It generates diverse, structured conversations through random walks and response simulation using large language models.
In our experiments, using graph-guided response simulations leads to significant improvements in intent classification, slot filling and response relevance.
arXiv Detail & Related papers (2024-04-23T06:23:34Z) - AutoConv: Automatically Generating Information-seeking Conversations
with Large Language Models [74.10293412011455]
We propose AutoConv for synthetic conversation generation.
Specifically, we formulate the conversation generation problem as a language modeling task.
We finetune an LLM with a few human conversations to capture the characteristics of the information-seeking process.
arXiv Detail & Related papers (2023-08-12T08:52:40Z) - AUGUST: an Automatic Generation Understudy for Synthesizing
Conversational Recommendation Datasets [56.052803235932686]
We propose a novel automatic dataset synthesis approach that can generate both large-scale and high-quality recommendation dialogues.
In doing so, we exploit: (i) rich personalized user profiles from traditional recommendation datasets, (ii) rich external knowledge from knowledge graphs, and (iii) the conversation ability contained in human-to-human conversational recommendation datasets.
arXiv Detail & Related papers (2023-06-16T05:27:14Z) - PLACES: Prompting Language Models for Social Conversation Synthesis [103.94325597273316]
We use a small set of expert-written conversations as in-context examples to synthesize a social conversation dataset using prompting.
We perform several thorough evaluations of our synthetic conversations compared to human-collected conversations.
arXiv Detail & Related papers (2023-02-07T05:48:16Z) - End-to-end Spoken Conversational Question Answering: Task, Dataset and
Model [92.18621726802726]
In spoken question answering, the systems are designed to answer questions from contiguous text spans within the related speech transcripts.
We propose a new Spoken Conversational Question Answering task (SCQA), aiming at enabling the systems to model complex dialogue flows.
Our main objective is to build the system to deal with conversational questions based on the audio recordings, and to explore the plausibility of providing more cues from different modalities with systems in information gathering.
arXiv Detail & Related papers (2022-04-29T17:56:59Z) - SYNERGY: Building Task Bots at Scale Using Symbolic Knowledge and
Machine Teaching [75.87418236410296]
SYNERGY is a hybrid learning framework where a task bot is developed in two steps.
A pre-trained neural dialog model, SOLOIST, is fine-tuned on the simulated dialogs to build a bot for the task.
The fine-tuned neural dialog model is continually refined with a handful of real task-specific dialogs via machine teaching.
arXiv Detail & Related papers (2021-10-21T23:13:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.