CPED: A Large-Scale Chinese Personalized and Emotional Dialogue Dataset
for Conversational AI
- URL: http://arxiv.org/abs/2205.14727v1
- Date: Sun, 29 May 2022 17:45:12 GMT
- Title: CPED: A Large-Scale Chinese Personalized and Emotional Dialogue Dataset
for Conversational AI
- Authors: Yirong Chen, Weiquan Fan, Xiaofen Xing, Jianxin Pang, Minlie Huang,
Wenjing Han, Qianfeng Tie, Xiangmin Xu
- Abstract summary: Most existing datasets for conversational AI ignore human personalities and emotions.
We propose CPED, a large-scale Chinese personalized and emotional dialogue dataset.
CPED contains more than 12K dialogues of 392 speakers from 40 TV shows.
- Score: 48.67259855309959
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Human language expression is based on the subjective construal of the
situation instead of the objective truth conditions, which means that speakers'
personalities and emotions after cognitive processing have an important
influence on conversation. However, most existing datasets for conversational
AI ignore human personalities and emotions, or only consider part of them. It's
difficult for dialogue systems to understand speakers' personalities and
emotions although large-scale pre-training language models have been widely
used. In order to consider both personalities and emotions in the process of
conversation generation, we propose CPED, a large-scale Chinese personalized
and emotional dialogue dataset, which consists of multi-source knowledge
related to empathy and personal characteristic. These knowledge covers gender,
Big Five personality traits, 13 emotions, 19 dialogue acts and 10 scenes. CPED
contains more than 12K dialogues of 392 speakers from 40 TV shows. We release
the textual dataset with audio features and video features according to the
copyright claims, privacy issues, terms of service of video platforms. We
provide detailed description of the CPED construction process and introduce
three tasks for conversational AI, including personality recognition, emotion
recognition in conversations as well as personalized and emotional conversation
generation. Finally, we provide baseline systems for these tasks and consider
the function of speakers' personalities and emotions on conversation. Our
motivation is to propose a dataset to be widely adopted by the NLP community as
a new open benchmark for conversational AI research. The full dataset is
available at https://github.com/scutcyr/CPED.
Related papers
- SemEval-2024 Task 3: Multimodal Emotion Cause Analysis in Conversations [53.60993109543582]
SemEval-2024 Task 3, named Multimodal Emotion Cause Analysis in Conversations, aims at extracting all pairs of emotions and their corresponding causes from conversations.
Under different modality settings, it consists of two subtasks: Textual Emotion-Cause Pair Extraction in Conversations (TECPE) and Multimodal Emotion-Cause Pair Extraction in Conversations (MECPE)
In this paper, we introduce the task, dataset and evaluation settings, summarize the systems of the top teams, and discuss the findings of the participants.
arXiv Detail & Related papers (2024-05-19T09:59:00Z) - Affective-NLI: Towards Accurate and Interpretable Personality Recognition in Conversation [30.820334868031537]
Personality Recognition in Conversation (PRC) aims to identify the personality traits of speakers through textual dialogue content.
We propose Affective Natural Language Inference (Affective-NLI) for accurate and interpretable PRC.
arXiv Detail & Related papers (2024-04-03T09:14:24Z) - Personality-affected Emotion Generation in Dialog Systems [67.40609683389947]
We propose a new task, Personality-affected Emotion Generation, to generate emotion based on the personality given to the dialog system.
We analyze the challenges in this task, i.e., (1) heterogeneously integrating personality and emotional factors and (2) extracting multi-granularity emotional information in the dialog context.
Results suggest that by adopting our method, the emotion generation performance is improved by 13% in macro-F1 and 5% in weighted-F1 from the BERT-base model.
arXiv Detail & Related papers (2024-04-03T08:48:50Z) - EmoInHindi: A Multi-label Emotion and Intensity Annotated Dataset in
Hindi for Emotion Recognition in Dialogues [44.79509115642278]
We create a large conversational dataset in Hindi named EmoInHindi for multi-label emotion and intensity recognition in conversations.
We prepare our dataset in a Wizard-of-Oz manner for mental health and legal counselling of crime victims.
arXiv Detail & Related papers (2022-05-27T11:23:50Z) - M3ED: Multi-modal Multi-scene Multi-label Emotional Dialogue Database [139.08528216461502]
We propose a Multi-modal Multi-scene Multi-label Emotional Dialogue dataset, M3ED.
M3ED contains 990 dyadic emotional dialogues from 56 different TV series, a total of 9,082 turns and 24,449 utterances.
To the best of our knowledge, M3ED is the first multimodal emotional dialogue dataset in Chinese.
arXiv Detail & Related papers (2022-05-09T06:52:51Z) - Responsive Listening Head Generation: A Benchmark Dataset and Baseline [58.168958284290156]
We define the responsive listening head generation task as the synthesis of a non-verbal head with motions and expressions reacting to the multiple inputs.
Unlike speech-driven gesture or talking head generation, we introduce more modals in this task, hoping to benefit several research fields.
arXiv Detail & Related papers (2021-12-27T07:18:50Z) - EmoWOZ: A Large-Scale Corpus and Labelling Scheme for Emotion in
Task-Oriented Dialogue Systems [3.3010169113961325]
EmoWOZ is a large-scale manually emotion-annotated corpus of task-oriented dialogues.
It contains more than 11K dialogues with more than 83K emotion annotations of user utterances.
We propose a novel emotion labelling scheme, which is tailored to task-oriented dialogues.
arXiv Detail & Related papers (2021-09-10T15:00:01Z) - Generating Empathetic Responses with a Large Scale Dialog Dataset [0.76146285961466]
Existing models either directly incorporate pre-defined emotion information to guide the response generation, or use deterministic rules to decide the response emotion.
We show how to build a multi-turn empathetic dialog model that performs well compared to its baselines over 6,000 human evaluated instances.
arXiv Detail & Related papers (2021-05-14T13:45:40Z) - Vyaktitv: A Multimodal Peer-to-Peer Hindi Conversations based Dataset
for Personality Assessment [50.15466026089435]
We present a novel peer-to-peer Hindi conversation dataset- Vyaktitv.
It consists of high-quality audio and video recordings of the participants, with Hinglish textual transcriptions for each conversation.
The dataset also contains a rich set of socio-demographic features, like income, cultural orientation, amongst several others, for all the participants.
arXiv Detail & Related papers (2020-08-31T17:44:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.