Construction and Evaluation of Mandarin Multimodal Emotional Speech
Database
- URL: http://arxiv.org/abs/2401.07336v1
- Date: Sun, 14 Jan 2024 17:56:36 GMT
- Title: Construction and Evaluation of Mandarin Multimodal Emotional Speech
Database
- Authors: Zhu Ting, Li Liangqi, Duan Shufei, Zhang Xueying, Xiao Zhongzhe, Jia
Hairng, Liang Huizhi
- Abstract summary: The validity of dimension annotation is verified by statistical analysis of dimension annotation data.
The recognition rate of seven emotions is about 82% when using acoustic data alone.
The database is of high quality and can be used as an important source for speech analysis research.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A multi-modal emotional speech Mandarin database including articulatory
kinematics, acoustics, glottal and facial micro-expressions is designed and
established, which is described in detail from the aspects of corpus design,
subject selection, recording details and data processing. Where signals are
labeled with discrete emotion labels (neutral, happy, pleasant, indifferent,
angry, sad, grief) and dimensional emotion labels (pleasure, arousal,
dominance). In this paper, the validity of dimension annotation is verified by
statistical analysis of dimension annotation data. The SCL-90 scale data of
annotators are verified and combined with PAD annotation data for analysis, so
as to explore the internal relationship between the outlier phenomenon in
annotation and the psychological state of annotators. In order to verify the
speech quality and emotion discrimination of the database, this paper uses 3
basic models of SVM, CNN and DNN to calculate the recognition rate of these
seven emotions. The results show that the average recognition rate of seven
emotions is about 82% when using acoustic data alone. When using glottal data
alone, the average recognition rate is about 72%. Using kinematics data alone,
the average recognition rate also reaches 55.7%. Therefore, the database is of
high quality and can be used as an important source for speech analysis
research, especially for the task of multimodal emotional speech analysis.
Related papers
- EMOVOME Database: Advancing Emotion Recognition in Speech Beyond Staged Scenarios [2.1455880234227624]
We released the Emotional Voice Messages (EMOVOME) database, including 999 voice messages from real conversations of 100 Spanish speakers on a messaging app.
We evaluated speaker-independent Speech Emotion Recognition (SER) models using a standard set of acoustic features and transformer-based models.
EMOVOME outcomes varied with annotator labels, showing better results and fairness when combining expert and non-expert annotations.
arXiv Detail & Related papers (2024-03-04T16:13:39Z) - Emotion Rendering for Conversational Speech Synthesis with Heterogeneous
Graph-Based Context Modeling [50.99252242917458]
Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting.
To address the issue of data scarcity, we meticulously create emotional labels in terms of category and intensity.
Our model outperforms the baseline models in understanding and rendering emotions.
arXiv Detail & Related papers (2023-12-19T08:47:50Z) - Design, construction and evaluation of emotional multimodal pathological
speech database [8.774681418339155]
The first Chinese multimodal emotional pathological speech database containing multi-perspective information is constructed.
All emotional speech was labeled for intelligibility, types and discrete dimensional emotions by developed WeChat mini-program.
The automatic recognition tested on speech and glottal data, with average accuracy of 78% for controls and 60% for patients in audio, while 51% for controls and 38% for patients in glottal data, indicating an influence of the disease on emotional expression.
arXiv Detail & Related papers (2023-12-14T14:43:31Z) - SER_AMPEL: a multi-source dataset for speech emotion recognition of
Italian older adults [58.49386651361823]
SER_AMPEL is a multi-source dataset for speech emotion recognition (SER)
It is collected with the aim of providing a reference for speech emotion recognition in case of Italian older adults.
The evidence of the need for such a dataset emerges from the analysis of the state of the art.
arXiv Detail & Related papers (2023-11-24T13:47:25Z) - EMNS /Imz/ Corpus: An emotive single-speaker dataset for narrative
storytelling in games, television and graphic novels [6.2375553155844266]
The Emotive Narrative Storytelling (EMNS) corpus is a unique speech dataset created to enhance conversations' emotive quality.
It consists of a 2.3-hour recording featuring a female speaker delivering labelled utterances.
It encompasses eight acted emotional states, evenly distributed with a variance of 0.68%, along with expressiveness levels and natural language descriptions with word emphasis labels.
arXiv Detail & Related papers (2023-05-22T15:32:32Z) - Feature Selection Enhancement and Feature Space Visualization for
Speech-Based Emotion Recognition [2.223733768286313]
We present speech features enhancement strategy that improves speech emotion recognition.
The strategy is compared with the state-of-the-art methods used in the literature.
Our method achieved an average recognition gain of 11.5% for six out of seven emotions for the EMO-DB dataset, and 13.8% for seven out of eight emotions for the RAVDESS dataset.
arXiv Detail & Related papers (2022-08-19T11:29:03Z) - BEAT: A Large-Scale Semantic and Emotional Multi-Modal Dataset for
Conversational Gestures Synthesis [9.95713767110021]
Body-Expression-Audio-Text dataset has i) 76 hours, high-quality, multi-modal data captured from 30 speakers talking with eight different emotions and in four different languages.
BEAT is the largest motion capture dataset for investigating the human gestures.
arXiv Detail & Related papers (2022-03-10T11:19:52Z) - Multimodal Emotion Recognition using Transfer Learning from Speaker
Recognition and BERT-based models [53.31917090073727]
We propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities.
We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture dataset.
arXiv Detail & Related papers (2022-02-16T00:23:42Z) - EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional
Text-to-Speech Model [56.75775793011719]
We introduce and publicly release a Mandarin emotion speech dataset including 9,724 samples with audio files and its emotion human-labeled annotation.
Unlike those models which need additional reference audio as input, our model could predict emotion labels just from the input text and generate more expressive speech conditioned on the emotion embedding.
In the experiment phase, we first validate the effectiveness of our dataset by an emotion classification task. Then we train our model on the proposed dataset and conduct a series of subjective evaluations.
arXiv Detail & Related papers (2021-06-17T08:34:21Z) - Affect2MM: Affective Analysis of Multimedia Content Using Emotion
Causality [84.69595956853908]
We present Affect2MM, a learning method for time-series emotion prediction for multimedia content.
Our goal is to automatically capture the varying emotions depicted by characters in real-life human-centric situations and behaviors.
arXiv Detail & Related papers (2021-03-11T09:07:25Z) - Vyaktitv: A Multimodal Peer-to-Peer Hindi Conversations based Dataset
for Personality Assessment [50.15466026089435]
We present a novel peer-to-peer Hindi conversation dataset- Vyaktitv.
It consists of high-quality audio and video recordings of the participants, with Hinglish textual transcriptions for each conversation.
The dataset also contains a rich set of socio-demographic features, like income, cultural orientation, amongst several others, for all the participants.
arXiv Detail & Related papers (2020-08-31T17:44:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.