Learning towards Selective Data Augmentation for Dialogue Generation
- URL: http://arxiv.org/abs/2303.09719v1
- Date: Fri, 17 Mar 2023 01:26:39 GMT
- Title: Learning towards Selective Data Augmentation for Dialogue Generation
- Authors: Xiuying Chen, Mingzhe Li, Jiayi Zhang, Xiaoqiang Xia, Chen Wei,
Jianwei Cui, Xin Gao, Xiangliang Zhang, Rui Yan
- Abstract summary: We argue that not all cases are beneficial for augmentation task, and the cases suitable for augmentation should obey the following two attributes.
We propose a Selective Data Augmentation framework (SDA) for the response generation task.
- Score: 52.540330534137794
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As it is cumbersome and expensive to acquire a huge amount of data for
training neural dialog models, data augmentation is proposed to effectively
utilize existing training samples. However, current data augmentation
techniques on the dialog generation task mostly augment all cases in the
training dataset without considering the intrinsic attributes between different
cases. We argue that not all cases are beneficial for augmentation task, and
the cases suitable for augmentation should obey the following two attributes:
(1) low-quality (the dialog model cannot generate a high-quality response for
the case), (2) representative (the case should represent the property of the
whole dataset). Herein, we explore this idea by proposing a Selective Data
Augmentation framework (SDA) for the response generation task. SDA employs a
dual adversarial network to select the lowest quality and most representative
data points for augmentation in one stage. Extensive experiments conducted on
two publicly available datasets, i.e., DailyDialog and OpenSubtitles, show that
our framework can improve the response generation performance with respect to
various metrics.
Related papers
- AUGUST: an Automatic Generation Understudy for Synthesizing
Conversational Recommendation Datasets [56.052803235932686]
We propose a novel automatic dataset synthesis approach that can generate both large-scale and high-quality recommendation dialogues.
In doing so, we exploit: (i) rich personalized user profiles from traditional recommendation datasets, (ii) rich external knowledge from knowledge graphs, and (iii) the conversation ability contained in human-to-human conversational recommendation datasets.
arXiv Detail & Related papers (2023-06-16T05:27:14Z) - Counterfactual Data Augmentation via Perspective Transition for
Open-Domain Dialogues [34.78482218571574]
We propose a data augmentation method to automatically augment high-quality responses with different semantics by counterfactual inference.
Experimental results show that our data augmentation method can augment high-quality responses with different semantics for a given dialogue history, and can outperform competitive baselines on multiple downstream tasks.
arXiv Detail & Related papers (2022-10-30T13:26:49Z) - Weakly Supervised Data Augmentation Through Prompting for Dialogue
Understanding [103.94325597273316]
We present a novel approach that iterates on augmentation quality by applying weakly-supervised filters.
We evaluate our methods on the emotion and act classification tasks in DailyDialog and the intent classification task in Facebook Multilingual Task-Oriented Dialogue.
For DailyDialog specifically, using 10% of the ground truth data we outperform the current state-of-the-art model which uses 100% of the data.
arXiv Detail & Related papers (2022-10-25T17:01:30Z) - Self-augmented Data Selection for Few-shot Dialogue Generation [18.794770678708637]
We adopt the self-training framework to deal with the few-shot MR-to-Text generation problem.
We propose a novel data selection strategy to select the data that our generation model is most uncertain about.
arXiv Detail & Related papers (2022-05-19T16:25:50Z) - Representative Subset Selection for Efficient Fine-Tuning in
Self-Supervised Speech Recognition [6.450618373898492]
We consider the task of identifying an optimal subset of data for efficient fine-tuning in self-supervised speech models for ASR.
We present the COWERAGE algorithm for representative subset selection in self-supervised ASR.
arXiv Detail & Related papers (2022-03-18T10:12:24Z) - Reasoning in Dialog: Improving Response Generation by Context Reading
Comprehension [49.92173751203827]
In multi-turn dialog, utterances do not always take the full form of sentences.
We propose to improve the response generation performance by examining the model's ability to answer a reading comprehension question.
arXiv Detail & Related papers (2020-12-14T10:58:01Z) - Hybrid Generative-Retrieval Transformers for Dialogue Domain Adaptation [77.62366712130196]
We present the winning entry at the fast domain adaptation task of DSTC8, a hybrid generative-retrieval model based on GPT-2 fine-tuned to the multi-domain MetaLWOz dataset.
Our model uses retrieval logic as a fallback, being SoTA on MetaLWOz in human evaluation (>4% improvement over the 2nd place system) and attaining competitive generalization performance in adaptation to the unseen MultiWOZ dataset.
arXiv Detail & Related papers (2020-03-03T18:07:42Z) - Improving Multi-Turn Response Selection Models with Complementary
Last-Utterance Selection by Instance Weighting [84.9716460244444]
We consider utilizing the underlying correlation in the data resource itself to derive different kinds of supervision signals.
We conduct extensive experiments in two public datasets and obtain significant improvement in both datasets.
arXiv Detail & Related papers (2020-02-18T06:29:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.