Low-Resource Dialogue Summarization with Domain-Agnostic Multi-Source
Pretraining
- URL: http://arxiv.org/abs/2109.04080v2
- Date: Sat, 11 Sep 2021 09:44:37 GMT
- Title: Low-Resource Dialogue Summarization with Domain-Agnostic Multi-Source
Pretraining
- Authors: Yicheng Zou, Bolin Zhu, Xingwu Hu, Tao Gui, Qi Zhang
- Abstract summary: Training a large summarization model is generally infeasible due to the inadequacy of dialogue data with annotated summaries.
We propose a multi-source pretraining paradigm to better leverage the external summary data.
Our approach achieves competitive performance and generalizes well in different dialogue scenarios.
- Score: 10.750492932503649
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the rapid increase in the volume of dialogue data from daily life, there
is a growing demand for dialogue summarization. Unfortunately, training a large
summarization model is generally infeasible due to the inadequacy of dialogue
data with annotated summaries. Most existing works for low-resource dialogue
summarization directly pretrain models in other domains, e.g., the news domain,
but they generally neglect the huge difference between dialogues and
conventional articles. To bridge the gap between out-of-domain pretraining and
in-domain fine-tuning, in this work, we propose a multi-source pretraining
paradigm to better leverage the external summary data. Specifically, we exploit
large-scale in-domain non-summary data to separately pretrain the dialogue
encoder and the summary decoder. The combined encoder-decoder model is then
pretrained on the out-of-domain summary data using adversarial critics, aiming
to facilitate domain-agnostic summarization. The experimental results on two
public datasets show that with only limited training data, our approach
achieves competitive performance and generalizes well in different dialogue
scenarios.
Related papers
- Multi-Stage Pre-training Enhanced by ChatGPT for Multi-Scenario
Multi-Domain Dialogue Summarization [20.60018442168502]
We propose a new pre-trained model specifically designed for multi-scenario multi-domain dialogue summarization.
It adopts a multi-stage pre-training strategy to reduce the gap between the pre-training objective and fine-tuning objective.
arXiv Detail & Related papers (2023-10-16T11:16:07Z) - Pre-training Multi-party Dialogue Models with Latent Discourse Inference [85.9683181507206]
We pre-train a model that understands the discourse structure of multi-party dialogues, namely, to whom each utterance is replying.
To fully utilize the unlabeled data, we propose to treat the discourse structures as latent variables, then jointly infer them and pre-train the discourse-aware model.
arXiv Detail & Related papers (2023-05-24T14:06:27Z) - DIONYSUS: A Pre-trained Model for Low-Resource Dialogue Summarization [127.714919036388]
DIONYSUS is a pre-trained encoder-decoder model for summarizing dialogues in any new domain.
Our experiments show that DIONYSUS outperforms existing methods on six datasets.
arXiv Detail & Related papers (2022-12-20T06:21:21Z) - Weakly Supervised Data Augmentation Through Prompting for Dialogue
Understanding [103.94325597273316]
We present a novel approach that iterates on augmentation quality by applying weakly-supervised filters.
We evaluate our methods on the emotion and act classification tasks in DailyDialog and the intent classification task in Facebook Multilingual Task-Oriented Dialogue.
For DailyDialog specifically, using 10% of the ground truth data we outperform the current state-of-the-art model which uses 100% of the data.
arXiv Detail & Related papers (2022-10-25T17:01:30Z) - GODEL: Large-Scale Pre-Training for Goal-Directed Dialog [119.1397031992088]
We introduce GODEL, a large pre-trained language model for dialog.
We show that GODEL outperforms state-of-the-art pre-trained dialog models in few-shot fine-tuning setups.
A novel feature of our evaluation methodology is the introduction of a notion of utility that assesses the usefulness of responses.
arXiv Detail & Related papers (2022-06-22T18:19:32Z) - Improving Multi-Party Dialogue Discourse Parsing via Domain Integration [25.805553277418813]
Multi-party conversations are implicitly organized by semantic level correlations across the interactive turns.
dialogue discourse analysis can be applied to predict the dependency structure and relations between the elementary discourse units.
Existing corpora with dialogue discourse annotation are collected from specific domains with limited sample sizes.
arXiv Detail & Related papers (2021-10-09T09:36:22Z) - Data-Efficient Methods for Dialogue Systems [4.061135251278187]
Conversational User Interface (CUI) has become ubiquitous in everyday life, in consumer-focused products like Siri and Alexa.
Deep learning underlies many recent breakthroughs in dialogue systems but requires very large amounts of training data, often annotated by experts.
In this thesis, we introduce a series of methods for training robust dialogue systems from minimal data.
arXiv Detail & Related papers (2020-12-05T02:51:09Z) - Dialogue Distillation: Open-Domain Dialogue Augmentation Using Unpaired
Data [61.71319905364992]
We propose a novel data augmentation method for training open-domain dialogue models by utilizing unpaired data.
A data-level distillation process is first proposed to construct augmented dialogues where both post and response are retrieved from the unpaired data.
A ranking module is employed to filter out low-quality dialogues.
A model-level distillation process is employed to distill a teacher model trained on high-quality paired data to augmented dialogue pairs.
arXiv Detail & Related papers (2020-09-20T13:06:38Z) - Multi-Referenced Training for Dialogue Response Generation [36.24321477524634]
We show that gap between the real world probability distribution and the single-referenced data's probability distribution prevents the model from learning the one-to-many relations efficiently.
We generate diverse pseudo references from a powerful pretrained model to build multi-referenced data that provides a better approximation of the real-world distribution.
arXiv Detail & Related papers (2020-09-15T14:17:53Z) - Learning an Unreferenced Metric for Online Dialogue Evaluation [53.38078951628143]
We propose an unreferenced automated evaluation metric that uses large pre-trained language models to extract latent representations of utterances.
We show that our model achieves higher correlation with human annotations in an online setting, while not requiring true responses for comparison during inference.
arXiv Detail & Related papers (2020-05-01T20:01:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.