Hierarchical Knowledge Distillation for Dialogue Sequence Labeling
- URL: http://arxiv.org/abs/2111.10957v1
- Date: Mon, 22 Nov 2021 02:45:23 GMT
- Title: Hierarchical Knowledge Distillation for Dialogue Sequence Labeling
- Authors: Shota Orihashi, Yoshihiro Yamazaki, Naoki Makishima, Mana Ihori,
Akihiko Takashima, Tomohiro Tanaka, Ryo Masumura
- Abstract summary: This paper presents a novel knowledge distillation method for dialogue sequence labeling.
It trains a small model by distilling the knowledge of a large and high performance teacher model.
Experiments on dialogue act estimation and call scene segmentation demonstrate the effectiveness of the proposed method.
- Score: 26.91186784763019
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a novel knowledge distillation method for dialogue
sequence labeling. Dialogue sequence labeling is a supervised learning task
that estimates labels for each utterance in the target dialogue document, and
is useful for many applications such as dialogue act estimation. Accurate
labeling is often realized by a hierarchically-structured large model
consisting of utterance-level and dialogue-level networks that capture the
contexts within an utterance and between utterances, respectively. However, due
to its large model size, such a model cannot be deployed on
resource-constrained devices. To overcome this difficulty, we focus on
knowledge distillation which trains a small model by distilling the knowledge
of a large and high performance teacher model. Our key idea is to distill the
knowledge while keeping the complex contexts captured by the teacher model. To
this end, the proposed method, hierarchical knowledge distillation, trains the
small model by distilling not only the probability distribution of the label
classification, but also the knowledge of utterance-level and dialogue-level
contexts trained in the teacher model by training the model to mimic the
teacher model's output in each level. Experiments on dialogue act estimation
and call scene segmentation demonstrate the effectiveness of the proposed
method.
Related papers
- Pre-training Multi-party Dialogue Models with Latent Discourse Inference [85.9683181507206]
We pre-train a model that understands the discourse structure of multi-party dialogues, namely, to whom each utterance is replying.
To fully utilize the unlabeled data, we propose to treat the discourse structures as latent variables, then jointly infer them and pre-train the discourse-aware model.
arXiv Detail & Related papers (2023-05-24T14:06:27Z) - Discovering Customer-Service Dialog System with Semi-Supervised Learning
and Coarse-to-Fine Intent Detection [6.869753194843482]
Task-oriented dialog aims to assist users in achieving specific goals through multi-turn conversation.
We constructed a weakly supervised dataset based on a teacher/student paradigm.
We also built a modular dialogue system and integrated coarse-to-fine grained classification for user intent detection.
arXiv Detail & Related papers (2022-12-23T14:36:43Z) - DialogZoo: Large-Scale Dialog-Oriented Task Learning [52.18193690394549]
We aim to build a unified foundation model which can solve massive diverse dialogue tasks.
To achieve this goal, we first collect a large-scale well-labeled dialogue dataset from 73 publicly available datasets.
arXiv Detail & Related papers (2022-05-25T11:17:16Z) - Can Visual Dialogue Models Do Scorekeeping? Exploring How Dialogue
Representations Incrementally Encode Shared Knowledge [17.285206913252786]
We propose a theory-based evaluation method for investigating to what degree models pretrained on the VisDial dataset incrementally build representations that appropriately do scorekeeping.
Our conclusion is that the ability to make the distinction between shared and privately known statements along the dialogue is moderately present in the analysed models, but not always incrementally consistent.
arXiv Detail & Related papers (2022-04-14T13:52:11Z) - DialogBERT: Discourse-Aware Response Generation via Learning to Recover
and Rank Utterances [18.199473005335093]
This paper presents DialogBERT, a novel conversational response generation model that enhances previous PLM-based dialogue models.
To efficiently capture the discourse-level coherence among utterances, we propose two training objectives, including masked utterance regression.
Experiments on three multi-turn conversation datasets show that our approach remarkably outperforms the baselines.
arXiv Detail & Related papers (2020-12-03T09:06:23Z) - Improving Classification through Weak Supervision in Context-specific
Conversational Agent Development for Teacher Education [1.215785021723604]
The effort required to develop an educational scenario specific conversational agent is time consuming.
Previous approaches to modeling annotations have relied on labeling thousands of examples and calculating inter-annotator agreement and majority votes.
We propose using a multi-task weak supervision method combined with active learning to address these concerns.
arXiv Detail & Related papers (2020-10-23T23:39:40Z) - Dialogue Distillation: Open-Domain Dialogue Augmentation Using Unpaired
Data [61.71319905364992]
We propose a novel data augmentation method for training open-domain dialogue models by utilizing unpaired data.
A data-level distillation process is first proposed to construct augmented dialogues where both post and response are retrieved from the unpaired data.
A ranking module is employed to filter out low-quality dialogues.
A model-level distillation process is employed to distill a teacher model trained on high-quality paired data to augmented dialogue pairs.
arXiv Detail & Related papers (2020-09-20T13:06:38Z) - Enhancing Dialogue Generation via Multi-Level Contrastive Learning [57.005432249952406]
We propose a multi-level contrastive learning paradigm to model the fine-grained quality of the responses with respect to the query.
A Rank-aware (RC) network is designed to construct the multi-level contrastive optimization objectives.
We build a Knowledge Inference (KI) component to capture the keyword knowledge from the reference during training and exploit such information to encourage the generation of informative words.
arXiv Detail & Related papers (2020-09-19T02:41:04Z) - Ranking Enhanced Dialogue Generation [77.8321855074999]
How to effectively utilize the dialogue history is a crucial problem in multi-turn dialogue generation.
Previous works usually employ various neural network architectures to model the history.
This paper proposes a Ranking Enhanced Dialogue generation framework.
arXiv Detail & Related papers (2020-08-13T01:49:56Z) - Low-Resource Knowledge-Grounded Dialogue Generation [74.09352261943913]
We consider knowledge-grounded dialogue generation under a natural assumption that only limited training examples are available.
We devise a disentangled response decoder in order to isolate parameters that depend on knowledge-grounded dialogues from the entire generation model.
With only 1/8 training data, our model can achieve the state-of-the-art performance and generalize well on out-of-domain knowledge.
arXiv Detail & Related papers (2020-02-24T16:20:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.