Related papers: PLACES: Prompting Language Models for Social Conversation Synthesis

PLACES: Prompting Language Models for Social Conversation Synthesis

URL: http://arxiv.org/abs/2302.03269v2
Date: Wed, 8 Feb 2023 02:33:46 GMT
Title: PLACES: Prompting Language Models for Social Conversation Synthesis
Authors: Maximillian Chen, Alexandros Papangelis, Chenyang Tao, Seokhwan Kim, Andy Rosenbaum, Yang Liu, Zhou Yu, Dilek Hakkani-Tur
Abstract summary: We use a small set of expert-written conversations as in-context examples to synthesize a social conversation dataset using prompting. We perform several thorough evaluations of our synthetic conversations compared to human-collected conversations.
Score: 103.94325597273316
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Collecting high quality conversational data can be very expensive for most applications and infeasible for others due to privacy, ethical, or similar concerns. A promising direction to tackle this problem is to generate synthetic dialogues by prompting large language models. In this work, we use a small set of expert-written conversations as in-context examples to synthesize a social conversation dataset using prompting. We perform several thorough evaluations of our synthetic conversations compared to human-collected conversations. This includes various dimensions of conversation quality with human evaluation directly on the synthesized conversations, and interactive human evaluation of chatbots fine-tuned on the synthetically generated dataset. We additionally demonstrate that this prompting approach is generalizable to multi-party conversations, providing potential to create new synthetic data for multi-party tasks. Our synthetic multi-party conversations were rated more favorably across all measured dimensions compared to conversation excerpts sampled from a human-collected multi-party dataset.

Related papers

ConvoGen: Enhancing Conversational AI with Synthetic Data: A Multi-Agent Approach [0.0]
We present ConvoGen, an innovative framework for generating synthetic conversational data using multi-agent systems. The generated data has numerous applications, including training and evaluating conversational AI models.
arXiv Detail & Related papers (2025-03-21T18:14:12Z)
REALTALK: A 21-Day Real-World Dataset for Long-Term Conversation [51.97224538045096]
We introduce REALTALK, a 21-day corpus of authentic messaging app dialogues. We compare EI attributes and persona consistency to understand the challenges posed by real-world dialogues. Our findings reveal that models struggle to simulate a user solely from dialogue history, while fine-tuning on specific user chats improves persona emulation.
arXiv Detail & Related papers (2025-02-18T20:29:01Z)
OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios [45.78414948567598]
We propose leveraging synthetic data to enhance the dialogue models across diverse scenarios. We introduce ShareChatX, the first comprehensive, large-scale dataset for spoken dialogue that spans diverse scenarios. We also explore critical aspects of training dialogue systems using synthetic data.
arXiv Detail & Related papers (2025-01-02T17:58:23Z)
Self-Directed Turing Test for Large Language Models [56.64615470513102]
The Turing test examines whether AIs can exhibit human-like behaviour in natural language conversations. Traditional Turing tests adopt a rigid dialogue format where each participant sends only one message each time. This paper proposes the Self-Directed Turing Test, which extends the original test with a burst dialogue format.
arXiv Detail & Related papers (2024-08-19T09:57:28Z)
Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition [48.527630771422935]
We propose a synthetic data generation pipeline for multi-speaker conversational ASR. We conduct evaluation by fine-tuning the Whisper ASR model for telephone and distant conversational speech settings.
arXiv Detail & Related papers (2024-08-17T14:47:05Z)
Faithful Persona-based Conversational Dataset Generation with Large Language Models [10.506653172302222]
High-quality conversational datasets are essential for developing AI models that can communicate with users. We propose a Generator-Critic architecture framework to expand the initial dataset, while improving the quality of its conversations. We release Synthetic-Persona-Chat, consisting of 20k conversations seeded from Persona-Chat.
arXiv Detail & Related papers (2023-12-15T18:23:50Z)
AutoConv: Automatically Generating Information-seeking Conversations with Large Language Models [74.10293412011455]
We propose AutoConv for synthetic conversation generation. Specifically, we formulate the conversation generation problem as a language modeling task. We finetune an LLM with a few human conversations to capture the characteristics of the information-seeking process.
arXiv Detail & Related papers (2023-08-12T08:52:40Z)
Does Collaborative Human-LM Dialogue Generation Help Information Extraction from Human Dialogues? [55.28340832822234]
Problem-solving human dialogues in real applications can be much more complex than existing Wizard-of-Oz collections. We introduce a human-in-the-loop dialogue generation framework capable of synthesizing realistic dialogues.
arXiv Detail & Related papers (2023-07-13T20:02:50Z)
NatCS: Eliciting Natural Customer Support Dialogues [5.398732055835996]
Existing task-oriented dialogue datasets are not representative of real customer support conversations. We introduce NatCS, a multi-domain collection of spoken customer service conversations.
arXiv Detail & Related papers (2023-05-04T17:25:24Z)
Knowledge-Grounded Conversational Data Augmentation with Generative Conversational Networks [76.11480953550013]
We take a step towards automatically generating conversational data using Generative Conversational Networks. We evaluate our approach on conversations with and without knowledge on the Topical Chat dataset.
arXiv Detail & Related papers (2022-07-22T22:37:14Z)
Summary Grounded Conversation Generation [10.470157142861174]
We show how pre-trained language models can be used to generate entire conversations, given only a summary of a conversation as the input. We also show that the accuracy of conversation summarization can be improved by augmenting a conversation summarization dataset with generated conversations.
arXiv Detail & Related papers (2021-06-07T04:46:31Z)
A Taxonomy of Empathetic Response Intents in Human Social Conversations [1.52292571922932]
Open-domain conversational agents are becoming increasingly popular in the natural language processing community. One of the challenges is enabling them to converse in an empathetic manner. Current neural response generation methods rely solely on end-to-end learning from large scale conversation data to generate dialogues. Recent work has shown the promise of combining dialogue act/intent modelling and neural response generation.
arXiv Detail & Related papers (2020-12-07T21:56:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.