Advancing Semi-Supervised Task Oriented Dialog Systems by JSA Learning
of Discrete Latent Variable Models
- URL: http://arxiv.org/abs/2207.12235v1
- Date: Mon, 25 Jul 2022 14:36:10 GMT
- Title: Advancing Semi-Supervised Task Oriented Dialog Systems by JSA Learning
of Discrete Latent Variable Models
- Authors: Yucheng Cai, Hong Liu, Zhijian Ou, Yi Huang and Junlan Feng
- Abstract summary: JSA-TOD represents the first work in developing JSA based semi-supervised learning of discrete latent variable conditional models.
Experiments show that JSA-TOD significantly outperforms its variational learning counterpart.
- Score: 22.249113574918034
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Developing semi-supervised task-oriented dialog (TOD) systems by leveraging
unlabeled dialog data has attracted increasing interests. For semi-supervised
learning of latent state TOD models, variational learning is often used, but
suffers from the annoying high-variance of the gradients propagated through
discrete latent variables and the drawback of indirectly optimizing the target
log-likelihood. Recently, an alternative algorithm, called joint stochastic
approximation (JSA), has emerged for learning discrete latent variable models
with impressive performances. In this paper, we propose to apply JSA to
semi-supervised learning of the latent state TOD models, which is referred to
as JSA-TOD. To our knowledge, JSA-TOD represents the first work in developing
JSA based semi-supervised learning of discrete latent variable conditional
models for such long sequential generation problems like in TOD systems.
Extensive experiments show that JSA-TOD significantly outperforms its
variational learning counterpart. Remarkably, semi-supervised JSA-TOD using 20%
labels performs close to the full-supervised baseline on MultiWOZ2.1.
Related papers
- Overcoming Catastrophic Forgetting by Exemplar Selection in Task-oriented Dialogue System [34.1424535903384]
We aim to overcome the forgetting problem in intelligent task-oriented dialogue systems (ToDs)
We propose a method (HESIT) with hyper-gradient-based exemplar strategy, which samples influential exemplars for periodic retraining.
Experimental results show that HESIT effectively alleviates catastrophic forgetting by exemplar selection, and achieves state-of-the-art performance on the largest CL benchmark of ToDs.
arXiv Detail & Related papers (2024-05-16T10:54:46Z) - Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters [65.15700861265432]
We present a parameter-efficient continual learning framework to alleviate long-term forgetting in incremental learning with vision-language models.
Our approach involves the dynamic expansion of a pre-trained CLIP model, through the integration of Mixture-of-Experts (MoE) adapters.
To preserve the zero-shot recognition capability of vision-language models, we introduce a Distribution Discriminative Auto-Selector.
arXiv Detail & Related papers (2024-03-18T08:00:23Z) - Task-Distributionally Robust Data-Free Meta-Learning [99.56612787882334]
Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data.
For the first time, we reveal two major challenges hindering their practical deployments: Task-Distribution Shift ( TDS) and Task-Distribution Corruption (TDC)
arXiv Detail & Related papers (2023-11-23T15:46:54Z) - ESimCSE Unsupervised Contrastive Learning Jointly with UDA
Semi-Supervised Learning for Large Label System Text Classification Mode [4.708633772366381]
The ESimCSE model efficiently learns text vector representations using unlabeled data to achieve better classification results.
UDA is trained using unlabeled data through semi-supervised learning methods to improve the prediction performance of the models and stability.
adversarial training techniques FGM and PGD are used in the model training process to improve the robustness and reliability of the model.
arXiv Detail & Related papers (2023-04-19T03:44:23Z) - How to Train Your DRAGON: Diverse Augmentation Towards Generalizable
Dense Retrieval [80.54532535622988]
We show that a generalizable dense retriever can be trained to achieve high accuracy in both supervised and zero-shot retrieval.
DRAGON, our dense retriever trained with diverse augmentation, is the first BERT-base-sized DR to achieve state-of-the-art effectiveness in both supervised and zero-shot evaluations.
arXiv Detail & Related papers (2023-02-15T03:53:26Z) - A Multi-Task BERT Model for Schema-Guided Dialogue State Tracking [78.2700757742992]
Task-oriented dialogue systems often employ a Dialogue State Tracker (DST) to successfully complete conversations.
Recent state-of-the-art DST implementations rely on schemata of diverse services to improve model robustness.
We propose a single multi-task BERT-based model that jointly solves the three DST tasks of intent prediction, requested slot prediction and slot filling.
arXiv Detail & Related papers (2022-07-02T13:27:59Z) - SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark
for Semantic and Generative Capabilities [76.97949110580703]
We introduce SUPERB-SG, a new benchmark to evaluate pre-trained models across various speech tasks.
We use a lightweight methodology to test the robustness of representations learned by pre-trained models under shifts in data domain.
We also show that the task diversity of SUPERB-SG coupled with limited task supervision is an effective recipe for evaluating the generalizability of model representation.
arXiv Detail & Related papers (2022-03-14T04:26:40Z) - Learning Mixtures of Linear Dynamical Systems [94.49754087817931]
We develop a two-stage meta-algorithm to efficiently recover each ground-truth LDS model up to error $tildeO(sqrtd/T)$.
We validate our theoretical studies with numerical experiments, confirming the efficacy of the proposed algorithm.
arXiv Detail & Related papers (2022-01-26T22:26:01Z) - Variational Latent-State GPT for Semi-supervised Task-Oriented Dialog
Systems [24.667353107453824]
Variational Latent-State GPT model (VLS-GPT) is the first to combine the strengths of the two approaches.
We develop the strategy of sampling-then-forward-computation, which successfully overcomes the memory explosion issue of using GPT in variational learning.
VLS-GPT is shown to significantly outperform both supervised-only and semi-supervised baselines.
arXiv Detail & Related papers (2021-09-09T14:42:29Z) - Joint Stochastic Approximation and Its Application to Learning Discrete
Latent Variable Models [19.07718284287928]
We show that the difficulty of obtaining reliable gradients for the inference model and the drawback of indirectly optimizing the target log-likelihood can be gracefully addressed.
We propose to directly maximize the target log-likelihood and simultaneously minimize the inclusive divergence between the posterior and the inference model.
The resulting learning algorithm is called joint SA (JSA)
arXiv Detail & Related papers (2020-05-28T13:50:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.