Variational Latent-State GPT for Semi-supervised Task-Oriented Dialog
Systems
- URL: http://arxiv.org/abs/2109.04314v1
- Date: Thu, 9 Sep 2021 14:42:29 GMT
- Title: Variational Latent-State GPT for Semi-supervised Task-Oriented Dialog
Systems
- Authors: Hong Liu, Yucheng Cai, Zhenru Lin, Zhijian Ou, Yi Huang, Junlan Feng
- Abstract summary: Variational Latent-State GPT model (VLS-GPT) is the first to combine the strengths of the two approaches.
We develop the strategy of sampling-then-forward-computation, which successfully overcomes the memory explosion issue of using GPT in variational learning.
VLS-GPT is shown to significantly outperform both supervised-only and semi-supervised baselines.
- Score: 24.667353107453824
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, two approaches, fine-tuning large pre-trained language models and
variational training, have attracted significant interests, separately, for
semi-supervised end-to-end task-oriented dialog (TOD) systems. In this paper,
we propose Variational Latent-State GPT model (VLS-GPT), which is the first to
combine the strengths of the two approaches. Among many options of models, we
propose the generative model and the inference model for variational learning
of the end-to-end TOD system, both as auto-regressive language models based on
GPT-2, which can be further trained over a mix of labeled and unlabeled dialog
data in a semi-supervised manner. We develop the strategy of
sampling-then-forward-computation, which successfully overcomes the memory
explosion issue of using GPT in variational learning and speeds up training.
Semi-supervised TOD experiments are conducted on two benchmark multi-domain
datasets of different languages - MultiWOZ2.1 and CrossWOZ. VLS-GPT is shown to
significantly outperform both supervised-only and semi-supervised baselines.
Related papers
- Unified Generative and Discriminative Training for Multi-modal Large Language Models [88.84491005030316]
Generative training has enabled Vision-Language Models (VLMs) to tackle various complex tasks.
Discriminative training, exemplified by models like CLIP, excels in zero-shot image-text classification and retrieval.
This paper proposes a unified approach that integrates the strengths of both paradigms.
arXiv Detail & Related papers (2024-11-01T01:51:31Z) - Prior-agnostic Multi-scale Contrastive Text-Audio Pre-training for Parallelized TTS Frontend Modeling [13.757256085713571]
We present a novel two-stage prediction pipeline, named TAP-FM, proposed in this paper.
Specifically, we present a Multi-scale Contrastive Text-audio Pre-training protocol (MC-TAP), which hammers at acquiring richer insights via multi-granularity contrastive pre-training in an unsupervised manner.
Our framework demonstrates the ability to delve deep into both global and local text-audio semantic and acoustic representations.
arXiv Detail & Related papers (2024-04-14T08:56:19Z) - Heuristic-enhanced Candidates Selection strategy for GPTs tackle Few-Shot Aspect-Based Sentiment Analysis [1.5020330976600738]
The paper designs a Heuristic-enhanced Candidates Selection strategy and further proposes All in One (AiO) model based on it.
The model works in a two-stage, which simultaneously accommodates the accuracy of PLMs and the capability of generalization.
The experimental results demonstrate that the proposed model can better adapt to multiple sub-tasks, and also outperforms the methods that directly utilize GPTs.
arXiv Detail & Related papers (2024-04-09T07:02:14Z) - Robust Training of Federated Models with Extremely Label Deficiency [84.00832527512148]
Federated semi-supervised learning (FSSL) has emerged as a powerful paradigm for collaboratively training machine learning models using distributed data with label deficiency.
We propose a novel twin-model paradigm, called Twin-sight, designed to enhance mutual guidance by providing insights from different perspectives of labeled and unlabeled data.
Our comprehensive experiments on four benchmark datasets provide substantial evidence that Twin-sight can significantly outperform state-of-the-art methods across various experimental settings.
arXiv Detail & Related papers (2024-02-22T10:19:34Z) - Advancing Semi-Supervised Task Oriented Dialog Systems by JSA Learning
of Discrete Latent Variable Models [22.249113574918034]
JSA-TOD represents the first work in developing JSA based semi-supervised learning of discrete latent variable conditional models.
Experiments show that JSA-TOD significantly outperforms its variational learning counterpart.
arXiv Detail & Related papers (2022-07-25T14:36:10Z) - On the Role of Bidirectionality in Language Model Pre-Training [85.14614350372004]
We study the role of bidirectionality in next token prediction, text infilling, zero-shot priming and fine-tuning.
We train models with up to 6.7B parameters, and find differences to remain consistent at scale.
arXiv Detail & Related papers (2022-05-24T02:25:05Z) - Multi-Task Learning for Situated Multi-Domain End-to-End Dialogue
Systems [21.55075825370981]
We leverage multi-task learning techniques to train a GPT-2 based model on a more challenging dataset.
Our method achieves better performance on all sub-tasks, across domains, compared to task and domain-specific models.
arXiv Detail & Related papers (2021-10-11T12:36:30Z) - Combining Deep Generative Models and Multi-lingual Pretraining for
Semi-supervised Document Classification [49.47925519332164]
We combine semi-supervised deep generative models and multi-lingual pretraining to form a pipeline for document classification task.
Our framework is highly competitive and outperforms the state-of-the-art counterparts in low-resource settings across several languages.
arXiv Detail & Related papers (2021-01-26T11:26:14Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Hybrid Generative-Retrieval Transformers for Dialogue Domain Adaptation [77.62366712130196]
We present the winning entry at the fast domain adaptation task of DSTC8, a hybrid generative-retrieval model based on GPT-2 fine-tuned to the multi-domain MetaLWOz dataset.
Our model uses retrieval logic as a fallback, being SoTA on MetaLWOz in human evaluation (>4% improvement over the 2nd place system) and attaining competitive generalization performance in adaptation to the unseen MultiWOZ dataset.
arXiv Detail & Related papers (2020-03-03T18:07:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.