Self-augmented Data Selection for Few-shot Dialogue Generation
- URL: http://arxiv.org/abs/2205.09661v1
- Date: Thu, 19 May 2022 16:25:50 GMT
- Title: Self-augmented Data Selection for Few-shot Dialogue Generation
- Authors: Wanyu Du, Hanjie Chen, Yangfeng Ji
- Abstract summary: We adopt the self-training framework to deal with the few-shot MR-to-Text generation problem.
We propose a novel data selection strategy to select the data that our generation model is most uncertain about.
- Score: 18.794770678708637
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The natural language generation (NLG) module in task-oriented dialogue
systems translates structured meaning representations (MRs) into text
responses, which has a great impact on users' experience as the human-machine
interaction interface. However, in practice, developers often only have a few
well-annotated data and confront a high data collection cost to build the NLG
module. In this work, we adopt the self-training framework to deal with the
few-shot MR-to-Text generation problem. We leverage the pre-trained language
model to self-augment many pseudo-labeled data. To prevent the gradual drift
from target data distribution to noisy augmented data distribution, we propose
a novel data selection strategy to select the data that our generation model is
most uncertain about. Compared with existing data selection methods, our method
is: (1) parameter-efficient, which does not require training any additional
neural models, (2) computation-efficient, which only needs to apply several
stochastic forward passes of the model to estimate the uncertainty. We conduct
empirical experiments on two benchmark datasets: FewShotWOZ and FewShotSGD, and
show that our proposed framework consistently outperforms other baselines in
terms of BLEU and ERR.
Related papers
- Self-Comparison for Dataset-Level Membership Inference in Large (Vision-)Language Models [73.94175015918059]
We propose a dataset-level membership inference method based on Self-Comparison.
Our method does not require access to ground-truth member data or non-member data in identical distribution.
arXiv Detail & Related papers (2024-10-16T23:05:59Z) - Efficient data selection employing Semantic Similarity-based Graph
Structures for model training [1.5845679507219355]
This paper introduces Semantics for data SAliency in Model performance Estimation (SeSaME)
It is an efficient data sampling mechanism solely based on textual information without passing the data through a compute-heavy model.
The application of this approach is demonstrated in the use case of low-resource automated speech recognition (ASR) models.
arXiv Detail & Related papers (2024-02-22T09:43:53Z) - Contrastive Transformer Learning with Proximity Data Generation for
Text-Based Person Search [60.626459715780605]
Given a descriptive text query, text-based person search aims to retrieve the best-matched target person from an image gallery.
Such a cross-modal retrieval task is quite challenging due to significant modality gap, fine-grained differences and insufficiency of annotated data.
In this paper, we propose a simple yet effective dual Transformer model for text-based person search.
arXiv Detail & Related papers (2023-11-15T16:26:49Z) - ReGen: Zero-Shot Text Classification via Training Data Generation with
Progressive Dense Retrieval [22.882301169283323]
We propose a retrieval-enhanced framework to create training data from a general-domain unlabeled corpus.
Experiments on nine datasets demonstrate that REGEN achieves 4.3% gain over the strongest baselines and saves around 70% of the time compared to baselines using large NLG models.
arXiv Detail & Related papers (2023-05-18T04:30:09Z) - Curriculum-Based Self-Training Makes Better Few-Shot Learners for
Data-to-Text Generation [56.98033565736974]
We propose Curriculum-Based Self-Training (CBST) to leverage unlabeled data in a rearranged order determined by the difficulty of text generation.
Our method can outperform fine-tuning and task-adaptive pre-training methods, and achieve state-of-the-art performance in the few-shot setting of data-to-text generation.
arXiv Detail & Related papers (2022-06-06T16:11:58Z) - A Model-Agnostic Data Manipulation Method for Persona-based Dialogue
Generation [107.82729587882397]
It is expensive to scale up current persona-based dialogue datasets.
Each data sample in this task is more complex to learn with than conventional dialogue data.
We propose a data manipulation method, which is model-agnostic to be packed with any persona-based dialogue generation model.
arXiv Detail & Related papers (2022-04-21T03:49:54Z) - Efficient Nearest Neighbor Language Models [114.40866461741795]
Non-parametric neural language models (NLMs) learn predictive distributions of text utilizing an external datastore.
We show how to achieve up to a 6x speed-up in inference speed while retaining comparable performance.
arXiv Detail & Related papers (2021-09-09T12:32:28Z) - AUGNLG: Few-shot Natural Language Generation using Self-trained Data
Augmentation [26.016540126949103]
This paper proposes AUGNLG, a novel data augmentation approach that combines a self-trained neural retrieval model with a few-shot learned NLU model.
The proposed system mostly outperforms the state-of-the-art methods on the FewShotWOZ data in both BLEU and Slot Error Rate.
arXiv Detail & Related papers (2021-06-10T08:45:28Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Multi-Referenced Training for Dialogue Response Generation [36.24321477524634]
We show that gap between the real world probability distribution and the single-referenced data's probability distribution prevents the model from learning the one-to-many relations efficiently.
We generate diverse pseudo references from a powerful pretrained model to build multi-referenced data that provides a better approximation of the real-world distribution.
arXiv Detail & Related papers (2020-09-15T14:17:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.