Towards Efficient Dialogue Pre-training with Transferable and
Interpretable Latent Structure
- URL: http://arxiv.org/abs/2210.12461v1
- Date: Sat, 22 Oct 2022 14:46:43 GMT
- Title: Towards Efficient Dialogue Pre-training with Transferable and
Interpretable Latent Structure
- Authors: Xueliang Zhao, Lemao Liu, Tingchen Fu, Shuming Shi, Dongyan Zhao and
Rui Yan
- Abstract summary: This paper proposes a novel dialogue generation model with a latent structure that is easily transferable from the general domain to downstream tasks in a lightweight and transparent way.
Thanks to the transferable latent structure, our model is able to yield better dialogue responses than four strong baselines in terms of both automatic and human evaluations.
- Score: 77.30953347462452
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: With the availability of massive general-domain dialogue data, pre-trained
dialogue generation appears to be super appealing to transfer knowledge from
the general domain to downstream applications. In most existing work, such
transferable ability is mainly obtained by fitting a large model with hundreds
of millions of parameters on massive data in an exhaustive way, leading to
inefficient running and poor interpretability. This paper proposes a novel
dialogue generation model with a latent structure that is easily transferable
from the general domain to downstream tasks in a lightweight and transparent
way. Experiments on two benchmarks validate the effectiveness of the proposed
model. Thanks to the transferable latent structure, our model is able to yield
better dialogue responses than four strong baselines in terms of both automatic
and human evaluations, and our model with about 22% parameters particularly
delivers a 5x speedup in running time compared with the strongest baseline.
Moreover, the proposed model is explainable by interpreting the discrete latent
variables.
Related papers
- When Parameter-efficient Tuning Meets General-purpose Vision-language
Models [65.19127815275307]
PETAL revolutionizes the training process by requiring only 0.5% of the total parameters, achieved through a unique mode approximation technique.
Our experiments reveal that PETAL not only outperforms current state-of-the-art methods in most scenarios but also surpasses full fine-tuning models in effectiveness.
arXiv Detail & Related papers (2023-12-16T17:13:08Z) - Pre-training Multi-party Dialogue Models with Latent Discourse Inference [85.9683181507206]
We pre-train a model that understands the discourse structure of multi-party dialogues, namely, to whom each utterance is replying.
To fully utilize the unlabeled data, we propose to treat the discourse structures as latent variables, then jointly infer them and pre-train the discourse-aware model.
arXiv Detail & Related papers (2023-05-24T14:06:27Z) - Few-Shot Dialogue Summarization via Skeleton-Assisted Prompt Transfer in
Prompt Tuning [47.336815771549524]
Skeleton-Assisted Prompt Transfer improves prompt transfer from dialogue state tracking to dialogue summarization.
We propose a novel approach with perturbation-based probes requiring neither annotation effort nor domain knowledge.
In-depth analyses demonstrate the effectiveness of our method in facilitating cross-task knowledge transfer in few-shot dialogue summarization.
arXiv Detail & Related papers (2023-05-20T03:32:48Z) - Counterfactual Data Augmentation via Perspective Transition for
Open-Domain Dialogues [34.78482218571574]
We propose a data augmentation method to automatically augment high-quality responses with different semantics by counterfactual inference.
Experimental results show that our data augmentation method can augment high-quality responses with different semantics for a given dialogue history, and can outperform competitive baselines on multiple downstream tasks.
arXiv Detail & Related papers (2022-10-30T13:26:49Z) - DialogVED: A Pre-trained Latent Variable Encoder-Decoder Model for
Dialog Response Generation [80.45816053153722]
DialogVED introduces continuous latent variables into the enhanced encoder-decoder pre-training framework to increase the relevance and diversity of responses.
We conduct experiments on PersonaChat, DailyDialog, and DSTC7-AVSD benchmarks for response generation.
arXiv Detail & Related papers (2022-04-27T16:18:15Z) - Multi-Referenced Training for Dialogue Response Generation [36.24321477524634]
We show that gap between the real world probability distribution and the single-referenced data's probability distribution prevents the model from learning the one-to-many relations efficiently.
We generate diverse pseudo references from a powerful pretrained model to build multi-referenced data that provides a better approximation of the real-world distribution.
arXiv Detail & Related papers (2020-09-15T14:17:53Z) - Modeling Long Context for Task-Oriented Dialogue State Generation [51.044300192906995]
We propose a multi-task learning model with a simple yet effective utterance tagging technique and a bidirectional language model.
Our approaches attempt to solve the problem that the performance of the baseline significantly drops when the input dialogue context sequence is long.
In our experiments, our proposed model achieves a 7.03% relative improvement over the baseline, establishing a new state-of-the-art joint goal accuracy of 52.04% on the MultiWOZ 2.0 dataset.
arXiv Detail & Related papers (2020-04-29T11:02:25Z) - Non-Autoregressive Dialog State Tracking [122.2328875457225]
We propose a novel framework of Non-Autoregressive Dialog State Tracking (NADST)
NADST can factor in potential dependencies among domains and slots to optimize the models towards better prediction of dialogue states as a complete set rather than separate slots.
Our results show that our model achieves the state-of-the-art joint accuracy across all domains on the MultiWOZ 2.1 corpus.
arXiv Detail & Related papers (2020-02-19T06:39:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.