Self-training Improves Pre-training for Few-shot Learning in
Task-oriented Dialog Systems
- URL: http://arxiv.org/abs/2108.12589v1
- Date: Sat, 28 Aug 2021 07:22:06 GMT
- Title: Self-training Improves Pre-training for Few-shot Learning in
Task-oriented Dialog Systems
- Authors: Fei Mi, Wanhao Zhou, Fengyu Cai, Lingjing Kong, Minlie Huang, and Boi
Faltings
- Abstract summary: Large-scale pre-trained language models, have shown promising results for few-shot learning in ToD.
We propose a self-training approach that iteratively labels the most confident unlabeled data to train a stronger Student model.
We conduct experiments and present analyses on four downstream tasks in ToD, including intent classification, dialog state tracking, dialog act prediction, and response selection.
- Score: 47.937191088981436
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As the labeling cost for different modules in task-oriented dialog (ToD)
systems is expensive, a major challenge is to train different modules with the
least amount of labeled data. Recently, large-scale pre-trained language
models, have shown promising results for few-shot learning in ToD. In this
paper, we devise a self-training approach to utilize the abundant unlabeled
dialog data to further improve state-of-the-art pre-trained models in few-shot
learning scenarios for ToD systems. Specifically, we propose a self-training
approach that iteratively labels the most confident unlabeled data to train a
stronger Student model. Moreover, a new text augmentation technique (GradAug)
is proposed to better train the Student by replacing non-crucial tokens using a
masked language model. We conduct extensive experiments and present analyses on
four downstream tasks in ToD, including intent classification, dialog state
tracking, dialog act prediction, and response selection. Empirical results
demonstrate that the proposed self-training approach consistently improves
state-of-the-art pre-trained models (BERT, ToD-BERT) when only a small number
of labeled data are available.
Related papers
- Pre-training Multi-party Dialogue Models with Latent Discourse Inference [85.9683181507206]
We pre-train a model that understands the discourse structure of multi-party dialogues, namely, to whom each utterance is replying.
To fully utilize the unlabeled data, we propose to treat the discourse structures as latent variables, then jointly infer them and pre-train the discourse-aware model.
arXiv Detail & Related papers (2023-05-24T14:06:27Z) - Stabilized In-Context Learning with Pre-trained Language Models for Few
Shot Dialogue State Tracking [57.92608483099916]
Large pre-trained language models (PLMs) have shown impressive unaided performance across many NLP tasks.
For more complex tasks such as dialogue state tracking (DST), designing prompts that reliably convey the desired intent is nontrivial.
We introduce a saliency model to limit dialogue text length, allowing us to include more exemplars per query.
arXiv Detail & Related papers (2023-02-12T15:05:10Z) - Self-augmented Data Selection for Few-shot Dialogue Generation [18.794770678708637]
We adopt the self-training framework to deal with the few-shot MR-to-Text generation problem.
We propose a novel data selection strategy to select the data that our generation model is most uncertain about.
arXiv Detail & Related papers (2022-05-19T16:25:50Z) - Representation Learning for Conversational Data using Discourse Mutual
Information Maximization [9.017156603976915]
We argue that the structure-unaware word-by-word generation is not suitable for effective conversation modeling.
We propose a structure-aware Mutual Information based loss-function DMI for training dialog-representation models.
Our models show the most promising performance on the dialog evaluation task DailyDialog++, in both random and adversarial negative scenarios.
arXiv Detail & Related papers (2021-12-04T13:17:07Z) - RADDLE: An Evaluation Benchmark and Analysis Platform for Robust
Task-oriented Dialog Systems [75.87418236410296]
We introduce the RADDLE benchmark, a collection of corpora and tools for evaluating the performance of models across a diverse set of domains.
RADDLE is designed to favor and encourage models with a strong generalization ability.
We evaluate recent state-of-the-art systems based on pre-training and fine-tuning, and find that grounded pre-training on heterogeneous dialog corpora performs better than training a separate model per domain.
arXiv Detail & Related papers (2020-12-29T08:58:49Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - SOLOIST: Building Task Bots at Scale with Transfer Learning and Machine
Teaching [81.45928589522032]
We parameterize modular task-oriented dialog systems using a Transformer-based auto-regressive language model.
We pre-train, on heterogeneous dialog corpora, a task-grounded response generation model.
Experiments show that SOLOIST creates new state-of-the-art on well-studied task-oriented dialog benchmarks.
arXiv Detail & Related papers (2020-05-11T17:58:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.