DG2: Data Augmentation Through Document Grounded Dialogue Generation
- URL: http://arxiv.org/abs/2112.08342v1
- Date: Wed, 15 Dec 2021 18:50:14 GMT
- Title: DG2: Data Augmentation Through Document Grounded Dialogue Generation
- Authors: Qingyang Wu, Song Feng, Derek Chen, Sachindra Joshi, Luis A. Lastras,
Zhou Yu
- Abstract summary: We propose an automatic data augmentation technique grounded on documents through a generative dialogue model.
When supplementing the original dataset, our method achieves significant improvement over traditional data augmentation methods.
- Score: 41.81030088619399
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Collecting data for training dialog systems can be extremely expensive due to
the involvement of human participants and need for extensive annotation.
Especially in document-grounded dialog systems, human experts need to carefully
read the unstructured documents to answer the users' questions. As a result,
existing document-grounded dialog datasets are relatively small-scale and
obstruct the effective training of dialogue systems. In this paper, we propose
an automatic data augmentation technique grounded on documents through a
generative dialogue model. The dialogue model consists of a user bot and agent
bot that can synthesize diverse dialogues given an input document, which are
then used to train a downstream model. When supplementing the original dataset,
our method achieves significant improvement over traditional data augmentation
methods. We also achieve great performance in the low-resource setting.
Related papers
- Contextual Data Augmentation for Task-Oriented Dialog Systems [8.085645180329417]
We develop a novel dialog augmentation model that generates a user turn, conditioning on full dialog context.
With a new prompt design for language model, and output re-ranking, the dialogs generated from our model can be directly used to train downstream dialog systems.
arXiv Detail & Related papers (2023-10-16T13:22:34Z) - Controllable Dialogue Simulation with In-Context Learning [39.04491297557292]
textscDialogic is a dialogue simulation method based on large language model in-context learning.
Our method can rapidly expand a small set of dialogue data with minimum or zero human involvement.
Our simulated dialogues have near-human fluency and annotation accuracy.
arXiv Detail & Related papers (2022-10-09T06:32:58Z) - Manual-Guided Dialogue for Flexible Conversational Agents [84.46598430403886]
How to build and use dialogue data efficiently, and how to deploy models in different domains at scale can be critical issues in building a task-oriented dialogue system.
We propose a novel manual-guided dialogue scheme, where the agent learns the tasks from both dialogue and manuals.
Our proposed scheme reduces the dependence of dialogue models on fine-grained domain ontology, and makes them more flexible to adapt to various domains.
arXiv Detail & Related papers (2022-08-16T08:21:12Z) - GODEL: Large-Scale Pre-Training for Goal-Directed Dialog [119.1397031992088]
We introduce GODEL, a large pre-trained language model for dialog.
We show that GODEL outperforms state-of-the-art pre-trained dialog models in few-shot fine-tuning setups.
A novel feature of our evaluation methodology is the introduction of a notion of utility that assesses the usefulness of responses.
arXiv Detail & Related papers (2022-06-22T18:19:32Z) - Response Generation with Context-Aware Prompt Learning [19.340498579331555]
We present a novel approach for pre-trained dialogue modeling that casts the dialogue generation problem as a prompt-learning task.
Instead of fine-tuning on limited dialogue data, our approach, DialogPrompt, learns continuous prompt embeddings optimized for dialogue contexts.
Our approach significantly outperforms the fine-tuning baseline and the generic prompt-learning methods.
arXiv Detail & Related papers (2021-11-04T05:40:13Z) - Dialogue Distillation: Open-Domain Dialogue Augmentation Using Unpaired
Data [61.71319905364992]
We propose a novel data augmentation method for training open-domain dialogue models by utilizing unpaired data.
A data-level distillation process is first proposed to construct augmented dialogues where both post and response are retrieved from the unpaired data.
A ranking module is employed to filter out low-quality dialogues.
A model-level distillation process is employed to distill a teacher model trained on high-quality paired data to augmented dialogue pairs.
arXiv Detail & Related papers (2020-09-20T13:06:38Z) - Rethinking Dialogue State Tracking with Reasoning [76.0991910623001]
This paper proposes to track dialogue states gradually with reasoning over dialogue turns with the help of the back-end data.
Empirical results demonstrate that our method significantly outperforms the state-of-the-art methods by 38.6% in terms of joint belief accuracy for MultiWOZ 2.1.
arXiv Detail & Related papers (2020-05-27T02:05:33Z) - Paraphrase Augmented Task-Oriented Dialog Generation [68.1790912977053]
We propose a paraphrase augmented response generation (PARG) framework that jointly trains a paraphrase model and a response generation model.
We also design a method to automatically construct paraphrase training data set based on dialog state and dialog act labels.
arXiv Detail & Related papers (2020-04-16T05:12:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.