Does Collaborative Human-LM Dialogue Generation Help Information
Extraction from Human Dialogues?
- URL: http://arxiv.org/abs/2307.07047v2
- Date: Tue, 20 Feb 2024 06:12:39 GMT
- Title: Does Collaborative Human-LM Dialogue Generation Help Information
Extraction from Human Dialogues?
- Authors: Bo-Ru Lu, Nikita Haduong, Chia-Hsuan Lee, Zeqiu Wu, Hao Cheng, Paul
Koester, Jean Utke, Tao Yu, Noah A. Smith, Mari Ostendorf
- Abstract summary: Problem-solving human dialogues in real applications can be much more complex than existing Wizard-of-Oz collections.
We introduce a human-in-the-loop dialogue generation framework capable of synthesizing realistic dialogues.
- Score: 55.28340832822234
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The capabilities of pretrained language models have opened opportunities to
explore new application areas, but applications involving human-human
interaction are limited by the fact that most data is protected from public
release for privacy reasons. Problem-solving human dialogues in real
applications can be much more complex than existing Wizard-of-Oz collections,
preventing successful domain transfer. To support information extraction (IE)
for a private call center dataset, we introduce a human-in-the-loop dialogue
generation framework capable of synthesizing realistic dialogues. In IE
experiments with auto insurance call center dialogues, we observe 25\% relative
improvement in $F_1$ after augmenting a small set of real human conversations
with synthetic data. We release code and our synthetic dataset to illustrate
the complexity of real-world call center conversations and encourage
development of complex dialogue datasets that are more representative of
natural data.
Related papers
- Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations [58.65755268815283]
Many real dialogues are interactive, meaning an agent's utterances will influence their conversational partner, elicit information, or change their opinion.
We use this fact to rewrite and augment existing suboptimal data, and train via offline reinforcement learning (RL) an agent that outperforms both prompting and learning from unaltered human demonstrations.
Our results in a user study with real humans show that our approach greatly outperforms existing state-of-the-art dialogue agents.
arXiv Detail & Related papers (2024-11-07T21:37:51Z) - DiaSynth: Synthetic Dialogue Generation Framework for Low Resource Dialogue Applications [18.378069426713]
Existing research is constrained by general or niche datasets that lack sufficient scale for training dialogue systems.
We introduce Dia Synth - a synthetic dialogue generation framework capable of generating high-quality, contextually rich dialogues.
We perform our experiments by generating synthetic data using different LLMs and few-shot examples from DialogSum and SAMSum.
arXiv Detail & Related papers (2024-09-25T07:03:31Z) - LUCID: LLM-Generated Utterances for Complex and Interesting Dialogues [38.6183579217801]
Virtual assistants are poised to take a leap forward in terms of their dialogue capabilities.
Yet a major bottleneck to achieving genuinely transformative task-oriented dialogue capabilities remains the scarcity of high quality data.
We use LUCID to generate a seed dataset of 4,277 conversations across 100 intents to demonstrate its capabilities.
arXiv Detail & Related papers (2024-03-01T11:33:53Z) - DialogStudio: Towards Richest and Most Diverse Unified Dataset
Collection for Conversational AI [92.29874802394167]
DialogStudio is the largest and most diverse collection of dialogue datasets.
Our collection encompasses data from open-domain dialogues, task-oriented dialogues, natural language understanding, conversational recommendation, dialogue summarization, and knowledge-grounded dialogues.
arXiv Detail & Related papers (2023-07-19T17:57:53Z) - AUGUST: an Automatic Generation Understudy for Synthesizing
Conversational Recommendation Datasets [56.052803235932686]
We propose a novel automatic dataset synthesis approach that can generate both large-scale and high-quality recommendation dialogues.
In doing so, we exploit: (i) rich personalized user profiles from traditional recommendation datasets, (ii) rich external knowledge from knowledge graphs, and (iii) the conversation ability contained in human-to-human conversational recommendation datasets.
arXiv Detail & Related papers (2023-06-16T05:27:14Z) - NatCS: Eliciting Natural Customer Support Dialogues [5.398732055835996]
Existing task-oriented dialogue datasets are not representative of real customer support conversations.
We introduce NatCS, a multi-domain collection of spoken customer service conversations.
arXiv Detail & Related papers (2023-05-04T17:25:24Z) - PLACES: Prompting Language Models for Social Conversation Synthesis [103.94325597273316]
We use a small set of expert-written conversations as in-context examples to synthesize a social conversation dataset using prompting.
We perform several thorough evaluations of our synthetic conversations compared to human-collected conversations.
arXiv Detail & Related papers (2023-02-07T05:48:16Z) - Controllable Dialogue Simulation with In-Context Learning [39.04491297557292]
textscDialogic is a dialogue simulation method based on large language model in-context learning.
Our method can rapidly expand a small set of dialogue data with minimum or zero human involvement.
Our simulated dialogues have near-human fluency and annotation accuracy.
arXiv Detail & Related papers (2022-10-09T06:32:58Z) - HybriDialogue: An Information-Seeking Dialogue Dataset Grounded on
Tabular and Textual Data [87.67278915655712]
We present a new dialogue dataset, HybriDialogue, which consists of crowdsourced natural conversations grounded on both Wikipedia text and tables.
The conversations are created through the decomposition of complex multihop questions into simple, realistic multiturn dialogue interactions.
arXiv Detail & Related papers (2022-04-28T00:52:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.