DialoGPS: Dialogue Path Sampling in Continuous Semantic Space for Data
Augmentation in Multi-Turn Conversations
- URL: http://arxiv.org/abs/2306.16770v1
- Date: Thu, 29 Jun 2023 08:12:47 GMT
- Title: DialoGPS: Dialogue Path Sampling in Continuous Semantic Space for Data
Augmentation in Multi-Turn Conversations
- Authors: Ang Lv, Jinpeng Li, Yuhan Chen, Xing Gao, Ji Zhang, Rui Yan
- Abstract summary: In open-domain dialogue generation tasks, contexts and responses in most datasets are one-to-one mapped.
We propose DialoGue Path Sampling (DialoGPS) in continuous semantic space, the first many-to-many augmentation method for multi-turn dialogues.
- Score: 18.98951277038404
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In open-domain dialogue generation tasks, contexts and responses in most
datasets are one-to-one mapped, violating an important many-to-many
characteristic: a context leads to various responses, and a response answers
multiple contexts. Without such patterns, models poorly generalize and prefer
responding safely. Many attempts have been made in either multi-turn settings
from a one-to-many perspective or in a many-to-many perspective but limited to
single-turn settings. The major challenge to many-to-many augment multi-turn
dialogues is that discretely replacing each turn with semantic similarity
breaks fragile context coherence. In this paper, we propose DialoGue Path
Sampling (DialoGPS) method in continuous semantic space, the first many-to-many
augmentation method for multi-turn dialogues. Specifically, we map a dialogue
to our extended Brownian Bridge, a special Gaussian process. We sample latent
variables to form coherent dialogue paths in the continuous space. A dialogue
path corresponds to a new multi-turn dialogue and is used as augmented training
data. We show the effect of DialoGPS with both automatic and human evaluation.
Related papers
- Multi-turn Dialogue Comprehension from a Topic-aware Perspective [70.37126956655985]
This paper proposes to model multi-turn dialogues from a topic-aware perspective.
We use a dialogue segmentation algorithm to split a dialogue passage into topic-concentrated fragments in an unsupervised way.
We also present a novel model, Topic-Aware Dual-Attention Matching (TADAM) Network, which takes topic segments as processing elements.
arXiv Detail & Related papers (2023-09-18T11:03:55Z) - Re$^3$Dial: Retrieve, Reorganize and Rescale Dialogue Corpus for
Long-Turn Open-Domain Dialogue Pre-training [90.3412708846419]
Most dialogues in existing pre-training corpora contain fewer than three turns of dialogue.
We propose the Retrieve, Reorganize and Rescale framework (Re$3$Dial) to automatically construct billion-scale long-turn dialogues.
By repeating the above process, Re$3$Dial can yield a coherent long-turn dialogue.
arXiv Detail & Related papers (2023-05-04T07:28:23Z) - M3ED: Multi-modal Multi-scene Multi-label Emotional Dialogue Database [139.08528216461502]
We propose a Multi-modal Multi-scene Multi-label Emotional Dialogue dataset, M3ED.
M3ED contains 990 dyadic emotional dialogues from 56 different TV series, a total of 9,082 turns and 24,449 utterances.
To the best of our knowledge, M3ED is the first multimodal emotional dialogue dataset in Chinese.
arXiv Detail & Related papers (2022-05-09T06:52:51Z) - HybriDialogue: An Information-Seeking Dialogue Dataset Grounded on
Tabular and Textual Data [87.67278915655712]
We present a new dialogue dataset, HybriDialogue, which consists of crowdsourced natural conversations grounded on both Wikipedia text and tables.
The conversations are created through the decomposition of complex multihop questions into simple, realistic multiturn dialogue interactions.
arXiv Detail & Related papers (2022-04-28T00:52:16Z) - A Context-Aware Hierarchical BERT Fusion Network for Multi-turn Dialog
Act Detection [6.361198391681688]
CaBERT-SLU is a context-aware hierarchical BERT fusion Network (CaBERT-SLU)
Our approach reaches new state-of-the-art (SOTA) performances in two complicated multi-turn dialogue datasets.
arXiv Detail & Related papers (2021-09-03T02:00:03Z) - Comprehensive Study: How the Context Information of Different
Granularity Affects Dialogue State Tracking? [17.476030563395714]
Dialogue state tracking (DST) plays a key role in task-oriented dialogue systems to monitor the user's goal.
In general, there are two strategies to track a dialogue state: predicting it from scratch and updating it from previous state.
arXiv Detail & Related papers (2021-05-08T03:18:13Z) - Dialogue History Matters! Personalized Response Selectionin Multi-turn
Retrieval-based Chatbots [62.295373408415365]
We propose a personalized hybrid matching network (PHMN) for context-response matching.
Our contributions are two-fold: 1) our model extracts personalized wording behaviors from user-specific dialogue history as extra matching information.
We evaluate our model on two large datasets with user identification, i.e., personalized dialogue Corpus Ubuntu (P- Ubuntu) and personalized Weibo dataset (P-Weibo)
arXiv Detail & Related papers (2021-03-17T09:42:11Z) - MultiTalk: A Highly-Branching Dialog Testbed for Diverse Conversations [39.81965687032923]
We present the MultiTalk dataset, a corpus of over 320,000 sentences of written conversational dialog.
We make multiple contributions to study dialog generation in the highly branching setting.
Our culminating task is a challenging theory of mind problem, a controllable generation task.
arXiv Detail & Related papers (2021-02-02T02:29:40Z) - Rethinking Dialogue State Tracking with Reasoning [76.0991910623001]
This paper proposes to track dialogue states gradually with reasoning over dialogue turns with the help of the back-end data.
Empirical results demonstrate that our method significantly outperforms the state-of-the-art methods by 38.6% in terms of joint belief accuracy for MultiWOZ 2.1.
arXiv Detail & Related papers (2020-05-27T02:05:33Z) - Diversifying Dialogue Generation with Non-Conversational Text [38.03510529185192]
We propose a new perspective to diversify dialogue generation by leveraging non-conversational text.
We collect a large-scale non-conversational corpus from multi sources including forum comments, idioms and book snippets.
The resulting model is tested on two conversational datasets and is shown to produce significantly more diverse responses without sacrificing the relevance with context.
arXiv Detail & Related papers (2020-05-09T02:16:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.