Emotional Voice Messages (EMOVOME) database: emotion recognition in spontaneous voice messages
- URL: http://arxiv.org/abs/2402.17496v2
- Date: Thu, 13 Jun 2024 13:09:48 GMT
- Title: Emotional Voice Messages (EMOVOME) database: emotion recognition in spontaneous voice messages
- Authors: Lucía Gómez Zaragozá, Rocío del Amor, Elena Parra Vargas, Valery Naranjo, Mariano Alcañiz Raya, Javier Marín-Morales,
- Abstract summary: Emotional Voice Messages (EMOVOME) is a spontaneous speech dataset containing 999 audio messages from real conversations on a messaging app from 100 Spanish speakers, gender balanced.
Voice messages were produced in-the-wild conditions before participants were recruited, avoiding any conscious bias due to laboratory environment.
This database will significantly contribute to research on emotion recognition in the wild, while also providing a unique natural and freely accessible resource for Spanish.
- Score: 2.1455880234227624
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Emotional Voice Messages (EMOVOME) is a spontaneous speech dataset containing 999 audio messages from real conversations on a messaging app from 100 Spanish speakers, gender balanced. Voice messages were produced in-the-wild conditions before participants were recruited, avoiding any conscious bias due to laboratory environment. Audios were labeled in valence and arousal dimensions by three non-experts and two experts, which were then combined to obtain a final label per dimension. The experts also provided an extra label corresponding to seven emotion categories. To set a baseline for future investigations using EMOVOME, we implemented emotion recognition models using both speech and audio transcriptions. For speech, we used the standard eGeMAPS feature set and support vector machines, obtaining 49.27% and 44.71% unweighted accuracy for valence and arousal respectively. For text, we fine-tuned a multilingual BERT model and achieved 61.15% and 47.43% unweighted accuracy for valence and arousal respectively. This database will significantly contribute to research on emotion recognition in the wild, while also providing a unique natural and freely accessible resource for Spanish.
Related papers
- EMOVOME Database: Advancing Emotion Recognition in Speech Beyond Staged Scenarios [2.1455880234227624]
We released the Emotional Voice Messages (EMOVOME) database, including 999 voice messages from real conversations of 100 Spanish speakers on a messaging app.
We evaluated speaker-independent Speech Emotion Recognition (SER) models using a standard set of acoustic features and transformer-based models.
EMOVOME outcomes varied with annotator labels, showing better results and fairness when combining expert and non-expert annotations.
arXiv Detail & Related papers (2024-03-04T16:13:39Z) - Construction and Evaluation of Mandarin Multimodal Emotional Speech
Database [0.0]
The validity of dimension annotation is verified by statistical analysis of dimension annotation data.
The recognition rate of seven emotions is about 82% when using acoustic data alone.
The database is of high quality and can be used as an important source for speech analysis research.
arXiv Detail & Related papers (2024-01-14T17:56:36Z) - Paralinguistics-Enhanced Large Language Modeling of Spoken Dialogue [71.15186328127409]
Paralinguistics-enhanced Generative Pretrained Transformer (ParalinGPT)
Model takes the conversational context of text, speech embeddings, and paralinguistic attributes as input prompts within a serialized multitasking framework.
We utilize the Switchboard-1 corpus, including its sentiment labels as the paralinguistic attribute, as our spoken dialogue dataset.
arXiv Detail & Related papers (2023-12-23T18:14:56Z) - SER_AMPEL: a multi-source dataset for speech emotion recognition of
Italian older adults [58.49386651361823]
SER_AMPEL is a multi-source dataset for speech emotion recognition (SER)
It is collected with the aim of providing a reference for speech emotion recognition in case of Italian older adults.
The evidence of the need for such a dataset emerges from the analysis of the state of the art.
arXiv Detail & Related papers (2023-11-24T13:47:25Z) - Effect of Attention and Self-Supervised Speech Embeddings on
Non-Semantic Speech Tasks [3.570593982494095]
We look at speech emotion understanding as a perception task which is a more realistic setting.
We leverage ComParE rich dataset of multilingual speakers and multi-label regression target of 'emotion share' or perception of that emotion.
Our results show that HuBERT-Large with a self-attention-based light-weight sequence model provides 4.6% improvement over the reported baseline.
arXiv Detail & Related papers (2023-08-28T07:11:27Z) - EMNS /Imz/ Corpus: An emotive single-speaker dataset for narrative
storytelling in games, television and graphic novels [6.2375553155844266]
The Emotive Narrative Storytelling (EMNS) corpus is a unique speech dataset created to enhance conversations' emotive quality.
It consists of a 2.3-hour recording featuring a female speaker delivering labelled utterances.
It encompasses eight acted emotional states, evenly distributed with a variance of 0.68%, along with expressiveness levels and natural language descriptions with word emphasis labels.
arXiv Detail & Related papers (2023-05-22T15:32:32Z) - Feature Selection Enhancement and Feature Space Visualization for
Speech-Based Emotion Recognition [2.223733768286313]
We present speech features enhancement strategy that improves speech emotion recognition.
The strategy is compared with the state-of-the-art methods used in the literature.
Our method achieved an average recognition gain of 11.5% for six out of seven emotions for the EMO-DB dataset, and 13.8% for seven out of eight emotions for the RAVDESS dataset.
arXiv Detail & Related papers (2022-08-19T11:29:03Z) - CPED: A Large-Scale Chinese Personalized and Emotional Dialogue Dataset
for Conversational AI [48.67259855309959]
Most existing datasets for conversational AI ignore human personalities and emotions.
We propose CPED, a large-scale Chinese personalized and emotional dialogue dataset.
CPED contains more than 12K dialogues of 392 speakers from 40 TV shows.
arXiv Detail & Related papers (2022-05-29T17:45:12Z) - Textless Speech Emotion Conversion using Decomposed and Discrete
Representations [49.55101900501656]
We decompose speech into discrete and disentangled learned representations, consisting of content units, F0, speaker, and emotion.
First, we modify the speech content by translating the content units to a target emotion, and then predict the prosodic features based on these units.
Finally, the speech waveform is generated by feeding the predicted representations into a neural vocoder.
arXiv Detail & Related papers (2021-11-14T18:16:42Z) - EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional
Text-to-Speech Model [56.75775793011719]
We introduce and publicly release a Mandarin emotion speech dataset including 9,724 samples with audio files and its emotion human-labeled annotation.
Unlike those models which need additional reference audio as input, our model could predict emotion labels just from the input text and generate more expressive speech conditioned on the emotion embedding.
In the experiment phase, we first validate the effectiveness of our dataset by an emotion classification task. Then we train our model on the proposed dataset and conduct a series of subjective evaluations.
arXiv Detail & Related papers (2021-06-17T08:34:21Z) - Emotional Voice Conversion: Theory, Databases and ESD [84.62083515557886]
We motivate the development of a novel emotional speech database ( ESD)
The ESD database consists of 350 parallel utterances spoken by 10 native English and 10 native Chinese speakers.
The database is suitable for multi-speaker and cross-lingual emotional voice conversion studies.
arXiv Detail & Related papers (2021-05-31T07:48:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.