PLACES: Prompting Language Models for Social Conversation Synthesis
- URL: http://arxiv.org/abs/2302.03269v2
- Date: Wed, 8 Feb 2023 02:33:46 GMT
- Title: PLACES: Prompting Language Models for Social Conversation Synthesis
- Authors: Maximillian Chen, Alexandros Papangelis, Chenyang Tao, Seokhwan Kim,
Andy Rosenbaum, Yang Liu, Zhou Yu, Dilek Hakkani-Tur
- Abstract summary: We use a small set of expert-written conversations as in-context examples to synthesize a social conversation dataset using prompting.
We perform several thorough evaluations of our synthetic conversations compared to human-collected conversations.
- Score: 103.94325597273316
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Collecting high quality conversational data can be very expensive for most
applications and infeasible for others due to privacy, ethical, or similar
concerns. A promising direction to tackle this problem is to generate synthetic
dialogues by prompting large language models. In this work, we use a small set
of expert-written conversations as in-context examples to synthesize a social
conversation dataset using prompting. We perform several thorough evaluations
of our synthetic conversations compared to human-collected conversations. This
includes various dimensions of conversation quality with human evaluation
directly on the synthesized conversations, and interactive human evaluation of
chatbots fine-tuned on the synthetically generated dataset. We additionally
demonstrate that this prompting approach is generalizable to multi-party
conversations, providing potential to create new synthetic data for multi-party
tasks. Our synthetic multi-party conversations were rated more favorably across
all measured dimensions compared to conversation excerpts sampled from a
human-collected multi-party dataset.
Related papers
- Self-Directed Turing Test for Large Language Models [56.64615470513102]
The Turing test examines whether AIs can exhibit human-like behaviour in natural language conversations.
Traditional Turing tests adopt a rigid dialogue format where each participant sends only one message each time.
This paper proposes the Self-Directed Turing Test, which extends the original test with a burst dialogue format.
arXiv Detail & Related papers (2024-08-19T09:57:28Z) - Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition [48.527630771422935]
We propose a synthetic data generation pipeline for multi-speaker conversational ASR.
We conduct evaluation by fine-tuning the Whisper ASR model for telephone and distant conversational speech settings.
arXiv Detail & Related papers (2024-08-17T14:47:05Z) - Faithful Persona-based Conversational Dataset Generation with Large
Language Models [10.506653172302222]
High-quality conversational datasets are essential for developing AI models that can communicate with users.
We propose a Generator-Critic architecture framework to expand the initial dataset, while improving the quality of its conversations.
We release Synthetic-Persona-Chat, consisting of 20k conversations seeded from Persona-Chat.
arXiv Detail & Related papers (2023-12-15T18:23:50Z) - AutoConv: Automatically Generating Information-seeking Conversations
with Large Language Models [74.10293412011455]
We propose AutoConv for synthetic conversation generation.
Specifically, we formulate the conversation generation problem as a language modeling task.
We finetune an LLM with a few human conversations to capture the characteristics of the information-seeking process.
arXiv Detail & Related papers (2023-08-12T08:52:40Z) - Does Collaborative Human-LM Dialogue Generation Help Information
Extraction from Human Dialogues? [55.28340832822234]
Problem-solving human dialogues in real applications can be much more complex than existing Wizard-of-Oz collections.
We introduce a human-in-the-loop dialogue generation framework capable of synthesizing realistic dialogues.
arXiv Detail & Related papers (2023-07-13T20:02:50Z) - NatCS: Eliciting Natural Customer Support Dialogues [5.398732055835996]
Existing task-oriented dialogue datasets are not representative of real customer support conversations.
We introduce NatCS, a multi-domain collection of spoken customer service conversations.
arXiv Detail & Related papers (2023-05-04T17:25:24Z) - Knowledge-Grounded Conversational Data Augmentation with Generative
Conversational Networks [76.11480953550013]
We take a step towards automatically generating conversational data using Generative Conversational Networks.
We evaluate our approach on conversations with and without knowledge on the Topical Chat dataset.
arXiv Detail & Related papers (2022-07-22T22:37:14Z) - Summary Grounded Conversation Generation [10.470157142861174]
We show how pre-trained language models can be used to generate entire conversations, given only a summary of a conversation as the input.
We also show that the accuracy of conversation summarization can be improved by augmenting a conversation summarization dataset with generated conversations.
arXiv Detail & Related papers (2021-06-07T04:46:31Z) - A Taxonomy of Empathetic Response Intents in Human Social Conversations [1.52292571922932]
Open-domain conversational agents are becoming increasingly popular in the natural language processing community.
One of the challenges is enabling them to converse in an empathetic manner.
Current neural response generation methods rely solely on end-to-end learning from large scale conversation data to generate dialogues.
Recent work has shown the promise of combining dialogue act/intent modelling and neural response generation.
arXiv Detail & Related papers (2020-12-07T21:56:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.