PRODIGy: a PROfile-based DIalogue Generation dataset
- URL: http://arxiv.org/abs/2311.05195v1
- Date: Thu, 9 Nov 2023 08:19:34 GMT
- Title: PRODIGy: a PROfile-based DIalogue Generation dataset
- Authors: Daniela Occhipinti, Serra Sinem Tekiroglu, Marco Guerini
- Abstract summary: We propose a new resource where each dialogue is aligned with all possible speaker representations such as communication style, biographies, and personality.
This framework allows to test several baselines built using generative language models with several profile configurations.
- Score: 14.123548564209068
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Providing dialogue agents with a profile representation can improve their
consistency and coherence, leading to better conversations. However, current
profile-based dialogue datasets for training such agents contain either
explicit profile representations that are simple and dialogue-specific, or
implicit representations that are difficult to collect. In this work, we
propose a unified framework in which we bring together both standard and more
sophisticated profile representations by creating a new resource where each
dialogue is aligned with all possible speaker representations such as
communication style, biographies, and personality. This framework allows to
test several baselines built using generative language models with several
profile configurations. The automatic evaluation shows that profile-based
models have better generalisation capabilities than models trained on dialogues
only, both in-domain and cross-domain settings. These results are consistent
for fine-tuned models and instruction-based LLMs. Additionally, human
evaluation demonstrates a clear preference for generations consistent with both
profile and context. Finally, to account for possible privacy concerns, all
experiments are done under two configurations: inter-character and
intra-character. In the former, the LM stores the information about the
character in its internal representation, while in the latter, the LM does not
retain any personal information but uses it only at inference time.
Related papers
- Apollonion: Profile-centric Dialog Agent [9.657755354649048]
We propose a framework for dialog agent to incorporate user profiling (initialization, update): user's query and response is analyzed and organized into a structural user profile.
We propose a series of evaluation protocols for personalization: to what extend the response is personal to the different users.
arXiv Detail & Related papers (2024-04-10T03:32:41Z) - PersonalityChat: Conversation Distillation for Personalized Dialog
Modeling with Facts and Traits [5.447308344436046]
PersonalityChat is a synthetic conversational dataset based upon the popular PersonaChat dataset.
We show that the personality trait labels can be used for trait-based personalization of generative dialogue models.
arXiv Detail & Related papers (2024-01-14T20:35:33Z) - FLIP: Towards Fine-grained Alignment between ID-based Models and Pretrained Language Models for CTR Prediction [49.510163437116645]
We propose to conduct Fine-grained feature-level ALignment between ID-based Models and Pretrained Language Models (FLIP) for click-through rate (CTR) prediction.
Specifically, the masked data of one modality (i.e., tokens or features) has to be recovered with the help of the other modality, which establishes the feature-level interaction and alignment.
Experiments on three real-world datasets demonstrate that FLIP outperforms SOTA baselines, and is highly compatible for various ID-based models and PLMs.
arXiv Detail & Related papers (2023-10-30T11:25:03Z) - DIONYSUS: A Pre-trained Model for Low-Resource Dialogue Summarization [127.714919036388]
DIONYSUS is a pre-trained encoder-decoder model for summarizing dialogues in any new domain.
Our experiments show that DIONYSUS outperforms existing methods on six datasets.
arXiv Detail & Related papers (2022-12-20T06:21:21Z) - CGoDial: A Large-Scale Benchmark for Chinese Goal-oriented Dialog
Evaluation [75.60156479374416]
CGoDial is a new challenging and comprehensive Chinese benchmark for Goal-oriented Dialog evaluation.
It contains 96,763 dialog sessions and 574,949 dialog turns totally, covering three datasets with different knowledge sources.
To bridge the gap between academic benchmarks and spoken dialog scenarios, we either collect data from real conversations or add spoken features to existing datasets via crowd-sourcing.
arXiv Detail & Related papers (2022-11-21T16:21:41Z) - SPACE-2: Tree-Structured Semi-Supervised Contrastive Pre-training for
Task-Oriented Dialog Understanding [68.94808536012371]
We propose a tree-structured pre-trained conversation model, which learns dialog representations from limited labeled dialogs and large-scale unlabeled dialog corpora.
Our method can achieve new state-of-the-art results on the DialoGLUE benchmark consisting of seven datasets and four popular dialog understanding tasks.
arXiv Detail & Related papers (2022-09-14T13:42:50Z) - Representation Learning for Conversational Data using Discourse Mutual
Information Maximization [9.017156603976915]
We argue that the structure-unaware word-by-word generation is not suitable for effective conversation modeling.
We propose a structure-aware Mutual Information based loss-function DMI for training dialog-representation models.
Our models show the most promising performance on the dialog evaluation task DailyDialog++, in both random and adversarial negative scenarios.
arXiv Detail & Related papers (2021-12-04T13:17:07Z) - Dialogue History Matters! Personalized Response Selectionin Multi-turn
Retrieval-based Chatbots [62.295373408415365]
We propose a personalized hybrid matching network (PHMN) for context-response matching.
Our contributions are two-fold: 1) our model extracts personalized wording behaviors from user-specific dialogue history as extra matching information.
We evaluate our model on two large datasets with user identification, i.e., personalized dialogue Corpus Ubuntu (P- Ubuntu) and personalized Weibo dataset (P-Weibo)
arXiv Detail & Related papers (2021-03-17T09:42:11Z) - RADDLE: An Evaluation Benchmark and Analysis Platform for Robust
Task-oriented Dialog Systems [75.87418236410296]
We introduce the RADDLE benchmark, a collection of corpora and tools for evaluating the performance of models across a diverse set of domains.
RADDLE is designed to favor and encourage models with a strong generalization ability.
We evaluate recent state-of-the-art systems based on pre-training and fine-tuning, and find that grounded pre-training on heterogeneous dialog corpora performs better than training a separate model per domain.
arXiv Detail & Related papers (2020-12-29T08:58:49Z) - CREDIT: Coarse-to-Fine Sequence Generation for Dialogue State Tracking [44.38388988238695]
A dialogue state tracker aims to accurately find a compact representation of the current dialogue status.
We employ a structured state representation and cast dialogue state tracking as a sequence generation problem.
Experiments demonstrate our tracker achieves encouraging joint goal accuracy for the five domains in MultiWOZ 2.0 and MultiWOZ 2.1 datasets.
arXiv Detail & Related papers (2020-09-22T10:27:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.