Weakly Supervised Data Augmentation Through Prompting for Dialogue
Understanding
- URL: http://arxiv.org/abs/2210.14169v2
- Date: Wed, 26 Oct 2022 02:14:23 GMT
- Title: Weakly Supervised Data Augmentation Through Prompting for Dialogue
Understanding
- Authors: Maximillian Chen, Alexandros Papangelis, Chenyang Tao, Andy Rosenbaum,
Seokhwan Kim, Yang Liu, Zhou Yu, Dilek Hakkani-Tur
- Abstract summary: We present a novel approach that iterates on augmentation quality by applying weakly-supervised filters.
We evaluate our methods on the emotion and act classification tasks in DailyDialog and the intent classification task in Facebook Multilingual Task-Oriented Dialogue.
For DailyDialog specifically, using 10% of the ground truth data we outperform the current state-of-the-art model which uses 100% of the data.
- Score: 103.94325597273316
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dialogue understanding tasks often necessitate abundant annotated data to
achieve good performance and that presents challenges in low-resource settings.
To alleviate this barrier, we explore few-shot data augmentation for dialogue
understanding by prompting large pre-trained language models and present a
novel approach that iterates on augmentation quality by applying
weakly-supervised filters. We evaluate our methods on the emotion and act
classification tasks in DailyDialog and the intent classification task in
Facebook Multilingual Task-Oriented Dialogue. Models fine-tuned on our
augmented data mixed with few-shot ground truth data are able to approach or
surpass existing state-of-the-art performance on both datasets. For DailyDialog
specifically, using 10% of the ground truth data we outperform the current
state-of-the-art model which uses 100% of the data.
Related papers
- DFlow: Diverse Dialogue Flow Simulation with Large Language Models [16.209331014315463]
This paper proposes a novel data augmentation method designed to enhance the diversity of synthetic dialogues.
We generate a task-oriented dialogue dataset comprising 3,886 dialogue flows across 15 different domains.
arXiv Detail & Related papers (2024-10-18T20:35:28Z) - Efficient Data Generation for Source-grounded Information-seeking Dialogs: A Use Case for Meeting Transcripts [10.829227084902428]
We investigate the feasibility and effectiveness of Large Language Models (LLMs)-based data generation in source-grounded information-seeking dialogs.
We create MISeD -- Meeting Information Seeking Dialogs dataset -- a dataset of information-seeking dialogs focused on meeting transcripts.
Finetuning on MISeD gives comparable response generation quality to finetuning on fully manual data, while improving attribution quality and reducing time and effort.
arXiv Detail & Related papers (2024-05-02T09:35:06Z) - CGoDial: A Large-Scale Benchmark for Chinese Goal-oriented Dialog
Evaluation [75.60156479374416]
CGoDial is a new challenging and comprehensive Chinese benchmark for Goal-oriented Dialog evaluation.
It contains 96,763 dialog sessions and 574,949 dialog turns totally, covering three datasets with different knowledge sources.
To bridge the gap between academic benchmarks and spoken dialog scenarios, we either collect data from real conversations or add spoken features to existing datasets via crowd-sourcing.
arXiv Detail & Related papers (2022-11-21T16:21:41Z) - A Model-Agnostic Data Manipulation Method for Persona-based Dialogue
Generation [107.82729587882397]
It is expensive to scale up current persona-based dialogue datasets.
Each data sample in this task is more complex to learn with than conventional dialogue data.
We propose a data manipulation method, which is model-agnostic to be packed with any persona-based dialogue generation model.
arXiv Detail & Related papers (2022-04-21T03:49:54Z) - Cross-Lingual Dialogue Dataset Creation via Outline-Based Generation [70.81596088969378]
Cross-lingual Outline-based Dialogue dataset (termed COD) enables natural language understanding.
COD enables dialogue state tracking, and end-to-end dialogue modelling and evaluation in 4 diverse languages.
arXiv Detail & Related papers (2022-01-31T18:11:21Z) - Contextual Semantic Parsing for Multilingual Task-Oriented Dialogues [7.8378818005171125]
Given a large-scale dialogue data set in one language, we can automatically produce an effective semantic for other languages using machine translation.
We propose automatic translation of dialogue datasets with alignment to ensure faithful translation of slot values.
We show that the succinct representation reduces the compounding effect of translation errors.
arXiv Detail & Related papers (2021-11-04T01:08:14Z) - Learning to Learn End-to-End Goal-Oriented Dialog From Related Dialog
Tasks [33.77022912718379]
We show that we can use only a small amount of data, supplemented with data from a related dialog task.
We describe a meta-learning based method that selectively learns from the related dialog task data.
arXiv Detail & Related papers (2021-10-10T15:27:45Z) - Dialogue Distillation: Open-Domain Dialogue Augmentation Using Unpaired
Data [61.71319905364992]
We propose a novel data augmentation method for training open-domain dialogue models by utilizing unpaired data.
A data-level distillation process is first proposed to construct augmented dialogues where both post and response are retrieved from the unpaired data.
A ranking module is employed to filter out low-quality dialogues.
A model-level distillation process is employed to distill a teacher model trained on high-quality paired data to augmented dialogue pairs.
arXiv Detail & Related papers (2020-09-20T13:06:38Z) - Paraphrase Augmented Task-Oriented Dialog Generation [68.1790912977053]
We propose a paraphrase augmented response generation (PARG) framework that jointly trains a paraphrase model and a response generation model.
We also design a method to automatically construct paraphrase training data set based on dialog state and dialog act labels.
arXiv Detail & Related papers (2020-04-16T05:12:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.