Divide, Conquer, and Combine: Mixture of Semantic-Independent Experts
for Zero-Shot Dialogue State Tracking
- URL: http://arxiv.org/abs/2306.00434v1
- Date: Thu, 1 Jun 2023 08:21:20 GMT
- Title: Divide, Conquer, and Combine: Mixture of Semantic-Independent Experts
for Zero-Shot Dialogue State Tracking
- Authors: Qingyue Wang, Liang Ding, Yanan Cao, Yibing Zhan, Zheng Lin, Shi Wang,
Dacheng Tao and Li Guo
- Abstract summary: Zero-shot transfer learning for Dialogue State Tracking (DST) helps to handle a variety of task-oriented dialogue domains without the cost of collecting in-domain data.
Existing works mainly study common data- or model-level augmentation methods to enhance the generalization.
We present a simple and effective "divide, conquer and combine" solution, which explicitly disentangles the semantics of seen data.
- Score: 83.40120598637665
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Zero-shot transfer learning for Dialogue State Tracking (DST) helps to handle
a variety of task-oriented dialogue domains without the cost of collecting
in-domain data. Existing works mainly study common data- or model-level
augmentation methods to enhance the generalization but fail to effectively
decouple the semantics of samples, limiting the zero-shot performance of DST.
In this paper, we present a simple and effective "divide, conquer and combine"
solution, which explicitly disentangles the semantics of seen data, and
leverages the performance and robustness with the mixture-of-experts mechanism.
Specifically, we divide the seen data into semantically independent subsets and
train corresponding experts, the newly unseen samples are mapped and inferred
with mixture-of-experts with our designed ensemble inference. Extensive
experiments on MultiWOZ2.1 upon the T5-Adapter show our schema significantly
and consistently improves the zero-shot performance, achieving the SOTA on
settings without external knowledge, with only 10M trainable parameters1.
Related papers
- Mixture of Efficient Diffusion Experts Through Automatic Interval and Sub-Network Selection [63.96018203905272]
We propose to reduce the sampling cost by pruning a pretrained diffusion model into a mixture of efficient experts.
We demonstrate the effectiveness of our method, DiffPruning, across several datasets.
arXiv Detail & Related papers (2024-09-23T21:27:26Z) - Semi-Supervised One-Shot Imitation Learning [83.94646047695412]
One-shot Imitation Learning aims to imbue AI agents with the ability to learn a new task from a single demonstration.
We introduce the semi-supervised OSIL problem setting, where the learning agent is presented with a large dataset of trajectories.
We develop an algorithm specifically applicable to this semi-supervised OSIL setting.
arXiv Detail & Related papers (2024-08-09T18:11:26Z) - A Framework for Fine-Tuning LLMs using Heterogeneous Feedback [69.51729152929413]
We present a framework for fine-tuning large language models (LLMs) using heterogeneous feedback.
First, we combine the heterogeneous feedback data into a single supervision format, compatible with methods like SFT and RLHF.
Next, given this unified feedback dataset, we extract a high-quality and diverse subset to obtain performance increases.
arXiv Detail & Related papers (2024-08-05T23:20:32Z) - Diverse and Effective Synthetic Data Generation for Adaptable Zero-Shot Dialogue State Tracking [12.116834890063146]
We show substantial performance gains in zero-shot dialogue state tracking (DST) by enhancing training data diversity through synthetic data generation.
Existing DST datasets are severely limited in the number of application domains and slot types they cover due to the high costs of data collection.
This work addresses this challenge with a novel, fully automatic data generation approach that creates synthetic zero-shot DST datasets.
arXiv Detail & Related papers (2024-05-21T03:04:14Z) - Textual Entailment for Event Argument Extraction: Zero- and Few-Shot
with Multi-Source Learning [22.531385318852426]
Recent work has shown that NLP tasks can be recasted as Textual Entailment tasks using verbalizations.
We show that entailment is also effective in Event Argument Extraction (EAE), reducing the need of manual annotation to 50% and 20%.
arXiv Detail & Related papers (2022-05-03T08:53:55Z) - Robust Dialogue State Tracking with Weak Supervision and Sparse Data [2.580163308334609]
Generalising dialogue state tracking (DST) to new data is challenging due to the strong reliance on abundant and fine-grained supervision during training.
Sample sparsity, distributional shift and the occurrence of new concepts and topics frequently lead to severe performance degradation during inference.
We propose a training strategy to build extractive DST models without the need for fine-grained manual span labels.
arXiv Detail & Related papers (2022-02-07T16:58:12Z) - Zero-Shot Dialogue State Tracking via Cross-Task Transfer [69.70718906395182]
We propose to transfer the textitcross-task knowledge from general question answering (QA) corpora for the zero-shot dialogue state tracking task.
Specifically, we propose TransferQA, a transferable generative QA model that seamlessly combines extractive QA and multi-choice QA.
In addition, we introduce two effective ways to construct unanswerable questions, namely, negative question sampling and context truncation.
arXiv Detail & Related papers (2021-09-10T03:57:56Z) - Effectiveness of Arbitrary Transfer Sets for Data-free Knowledge
Distillation [28.874162427052905]
We investigate the effectiveness of "arbitrary transfer sets" such as random noise, publicly available synthetic, and natural datasets.
We find surprising effectiveness of using arbitrary data to conduct knowledge distillation when this dataset is "target-class balanced"
arXiv Detail & Related papers (2020-11-18T06:33:20Z) - Improving Limited Labeled Dialogue State Tracking with Self-Supervision [91.68515201803986]
Existing dialogue state tracking (DST) models require plenty of labeled data.
We present and investigate two self-supervised objectives: preserving latent consistency and modeling conversational behavior.
Our proposed self-supervised signals can improve joint goal accuracy by 8.95% when only 1% labeled data is used.
arXiv Detail & Related papers (2020-10-26T21:57:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.