Multi-Task Learning for Situated Multi-Domain End-to-End Dialogue
Systems
- URL: http://arxiv.org/abs/2110.05221v1
- Date: Mon, 11 Oct 2021 12:36:30 GMT
- Title: Multi-Task Learning for Situated Multi-Domain End-to-End Dialogue
Systems
- Authors: Po-Nien Kung, Chung-Cheng Chang, Tse-Hsuan Yang, Hsin-Kai Hsu, Yu-Jia
Liou, Yun-Nung Chen
- Abstract summary: We leverage multi-task learning techniques to train a GPT-2 based model on a more challenging dataset.
Our method achieves better performance on all sub-tasks, across domains, compared to task and domain-specific models.
- Score: 21.55075825370981
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Task-oriented dialogue systems have been a promising area in the NLP field.
Previous work showed the effectiveness of using a single GPT-2 based model to
predict belief states and responses via causal language modeling. In this
paper, we leverage multi-task learning techniques to train a GPT-2 based model
on a more challenging dataset with multiple domains, multiple modalities, and
more diversity in output formats.
Using only a single model, our method achieves better performance on all
sub-tasks, across domains, compared to task and domain-specific models.
Furthermore, we evaluated several proposed strategies for GPT-2 based dialogue
systems with comprehensive ablation studies, showing that all techniques can
further improve the performance.
Related papers
- M$^2$PT: Multimodal Prompt Tuning for Zero-shot Instruction Learning [90.75075886543404]
Multimodal Large Language Models (MLLMs) demonstrate remarkable performance across a wide range of domains.
In this work, we introduce a novel Multimodal Prompt Tuning (M$2$PT) approach for efficient instruction tuning of MLLMs.
arXiv Detail & Related papers (2024-09-24T01:40:24Z) - DialCLIP: Empowering CLIP as Multi-Modal Dialog Retriever [83.33209603041013]
We propose a parameter-efficient prompt-tuning method named DialCLIP for multi-modal dialog retrieval.
Our approach introduces a multi-modal context generator to learn context features which are distilled into prompts within the pre-trained vision-language model CLIP.
To facilitate various types of retrieval, we also design multiple experts to learn mappings from CLIP outputs to multi-modal representation space.
arXiv Detail & Related papers (2024-01-02T07:40:12Z) - Multitask Multimodal Prompted Training for Interactive Embodied Task
Completion [48.69347134411864]
Embodied MultiModal Agent (EMMA) is a unified encoder-decoder model that reasons over images and trajectories.
By unifying all tasks as text generation, EMMA learns a language of actions which facilitates transfer across tasks.
arXiv Detail & Related papers (2023-11-07T15:27:52Z) - Multi-Stage Pre-training Enhanced by ChatGPT for Multi-Scenario
Multi-Domain Dialogue Summarization [20.60018442168502]
We propose a new pre-trained model specifically designed for multi-scenario multi-domain dialogue summarization.
It adopts a multi-stage pre-training strategy to reduce the gap between the pre-training objective and fine-tuning objective.
arXiv Detail & Related papers (2023-10-16T11:16:07Z) - Application of frozen large-scale models to multimodal task-oriented
dialogue [0.0]
We use the existing Large Language Models ENnhanced to See Framework (LENS Framework) to test the feasibility of multimodal task-oriented dialogues.
The LENS Framework has been proposed as a method to solve computer vision tasks without additional training and with fixed parameters of pre-trained models.
arXiv Detail & Related papers (2023-10-02T01:42:28Z) - Multi-View Zero-Shot Open Intent Induction from Dialogues: Multi Domain
Batch and Proxy Gradient Transfer [16.804434185847363]
In Task Oriented Dialogue (TOD) system, detecting and inducing new intents are two main challenges to apply the system in the real world.
We suggest the semantic multi-view model to resolve these two challenges.
We introduce a novel method PGT, which employs the Siamese network to fine-tune the model with a clustering method directly.
arXiv Detail & Related papers (2023-03-23T08:30:35Z) - A Multi-Task BERT Model for Schema-Guided Dialogue State Tracking [78.2700757742992]
Task-oriented dialogue systems often employ a Dialogue State Tracker (DST) to successfully complete conversations.
Recent state-of-the-art DST implementations rely on schemata of diverse services to improve model robustness.
We propose a single multi-task BERT-based model that jointly solves the three DST tasks of intent prediction, requested slot prediction and slot filling.
arXiv Detail & Related papers (2022-07-02T13:27:59Z) - Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System [26.837972034630003]
PPTOD is a unified plug-and-play model for task-oriented dialogue.
We extensively test our model on three benchmark TOD tasks, including end-to-end dialogue modelling, dialogue state tracking, and intent classification.
arXiv Detail & Related papers (2021-09-29T22:02:18Z) - Variational Latent-State GPT for Semi-supervised Task-Oriented Dialog
Systems [24.667353107453824]
Variational Latent-State GPT model (VLS-GPT) is the first to combine the strengths of the two approaches.
We develop the strategy of sampling-then-forward-computation, which successfully overcomes the memory explosion issue of using GPT in variational learning.
VLS-GPT is shown to significantly outperform both supervised-only and semi-supervised baselines.
arXiv Detail & Related papers (2021-09-09T14:42:29Z) - RADDLE: An Evaluation Benchmark and Analysis Platform for Robust
Task-oriented Dialog Systems [75.87418236410296]
We introduce the RADDLE benchmark, a collection of corpora and tools for evaluating the performance of models across a diverse set of domains.
RADDLE is designed to favor and encourage models with a strong generalization ability.
We evaluate recent state-of-the-art systems based on pre-training and fine-tuning, and find that grounded pre-training on heterogeneous dialog corpora performs better than training a separate model per domain.
arXiv Detail & Related papers (2020-12-29T08:58:49Z) - Hybrid Generative-Retrieval Transformers for Dialogue Domain Adaptation [77.62366712130196]
We present the winning entry at the fast domain adaptation task of DSTC8, a hybrid generative-retrieval model based on GPT-2 fine-tuned to the multi-domain MetaLWOz dataset.
Our model uses retrieval logic as a fallback, being SoTA on MetaLWOz in human evaluation (>4% improvement over the 2nd place system) and attaining competitive generalization performance in adaptation to the unseen MultiWOZ dataset.
arXiv Detail & Related papers (2020-03-03T18:07:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.