nEMO: Dataset of Emotional Speech in Polish
- URL: http://arxiv.org/abs/2404.06292v1
- Date: Tue, 9 Apr 2024 13:18:52 GMT
- Title: nEMO: Dataset of Emotional Speech in Polish
- Authors: Iwona Christop,
- Abstract summary: nEMO is a novel corpus of emotional speech in Polish.
The dataset comprises over 3 hours of samples recorded with the participation of nine actors portraying six emotional states.
The text material used was carefully selected to represent the phonetics of the Polish language adequately.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Speech emotion recognition has become increasingly important in recent years due to its potential applications in healthcare, customer service, and personalization of dialogue systems. However, a major issue in this field is the lack of datasets that adequately represent basic emotional states across various language families. As datasets covering Slavic languages are rare, there is a need to address this research gap. This paper presents the development of nEMO, a novel corpus of emotional speech in Polish. The dataset comprises over 3 hours of samples recorded with the participation of nine actors portraying six emotional states: anger, fear, happiness, sadness, surprise, and a neutral state. The text material used was carefully selected to represent the phonetics of the Polish language adequately. The corpus is freely available under the terms of a Creative Commons license (CC BY-NC-SA 4.0).
Related papers
- MASIVE: Open-Ended Affective State Identification in English and Spanish [10.41502827362741]
In this work, we broaden our scope to a practically unbounded set of textitaffective states, which includes any terms that humans use to describe their experiences of feeling.
We collect and publish MASIVE, a dataset of Reddit posts in English and Spanish containing over 1,000 unique affective states each.
On this task, we find that smaller finetuned multilingual models outperform much larger LLMs, even on region-specific Spanish affective states.
arXiv Detail & Related papers (2024-07-16T21:43:47Z) - MELD-ST: An Emotion-aware Speech Translation Dataset [29.650945917540316]
We present the MELD-ST dataset for the emotion-aware speech translation task, comprising English-to-Japanese and English-to-German language pairs.
Each language pair includes about 10,000 utterances annotated with emotion labels from the MELD dataset.
Baseline experiments using the SeamlessM4T model on the dataset indicate that fine-tuning with emotion labels can enhance translation performance in some settings.
arXiv Detail & Related papers (2024-05-21T22:40:38Z) - English Prompts are Better for NLI-based Zero-Shot Emotion
Classification than Target-Language Prompts [17.099269597133265]
We show that it is consistently better to use English prompts even if the data is in a different language.
Our experiments with natural language inference-based language models show that it is consistently better to use English prompts even if the data is in a different language.
arXiv Detail & Related papers (2024-02-05T17:36:19Z) - BANSpEmo: A Bangla Emotional Speech Recognition Dataset [0.0]
This corpus contains 792 audio recordings over a duration of more than 1 hour and 23 minutes.
The data set consists of 12 Bangla sentences which are uttered in 6 emotions as Disgust, Happy, Sad, Surprised, Anger, and Fear.
BanSpEmo can be considered as a useful resource to promote emotion and speech recognition research and related applications in the Bangla language.
arXiv Detail & Related papers (2023-12-21T16:52:41Z) - SER_AMPEL: a multi-source dataset for speech emotion recognition of
Italian older adults [58.49386651361823]
SER_AMPEL is a multi-source dataset for speech emotion recognition (SER)
It is collected with the aim of providing a reference for speech emotion recognition in case of Italian older adults.
The evidence of the need for such a dataset emerges from the analysis of the state of the art.
arXiv Detail & Related papers (2023-11-24T13:47:25Z) - EMNS /Imz/ Corpus: An emotive single-speaker dataset for narrative
storytelling in games, television and graphic novels [6.2375553155844266]
The Emotive Narrative Storytelling (EMNS) corpus is a unique speech dataset created to enhance conversations' emotive quality.
It consists of a 2.3-hour recording featuring a female speaker delivering labelled utterances.
It encompasses eight acted emotional states, evenly distributed with a variance of 0.68%, along with expressiveness levels and natural language descriptions with word emphasis labels.
arXiv Detail & Related papers (2023-05-22T15:32:32Z) - Sentiment recognition of Italian elderly through domain adaptation on
cross-corpus speech dataset [77.99182201815763]
The aim of this work is to define a speech emotion recognition (SER) model able to recognize positive, neutral and negative emotions in natural conversations of Italian elderly people.
arXiv Detail & Related papers (2022-11-14T12:39:41Z) - CPED: A Large-Scale Chinese Personalized and Emotional Dialogue Dataset
for Conversational AI [48.67259855309959]
Most existing datasets for conversational AI ignore human personalities and emotions.
We propose CPED, a large-scale Chinese personalized and emotional dialogue dataset.
CPED contains more than 12K dialogues of 392 speakers from 40 TV shows.
arXiv Detail & Related papers (2022-05-29T17:45:12Z) - EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional
Text-to-Speech Model [56.75775793011719]
We introduce and publicly release a Mandarin emotion speech dataset including 9,724 samples with audio files and its emotion human-labeled annotation.
Unlike those models which need additional reference audio as input, our model could predict emotion labels just from the input text and generate more expressive speech conditioned on the emotion embedding.
In the experiment phase, we first validate the effectiveness of our dataset by an emotion classification task. Then we train our model on the proposed dataset and conduct a series of subjective evaluations.
arXiv Detail & Related papers (2021-06-17T08:34:21Z) - Emotional Voice Conversion: Theory, Databases and ESD [84.62083515557886]
We motivate the development of a novel emotional speech database ( ESD)
The ESD database consists of 350 parallel utterances spoken by 10 native English and 10 native Chinese speakers.
The database is suitable for multi-speaker and cross-lingual emotional voice conversion studies.
arXiv Detail & Related papers (2021-05-31T07:48:56Z) - Limited Data Emotional Voice Conversion Leveraging Text-to-Speech:
Two-stage Sequence-to-Sequence Training [91.95855310211176]
Emotional voice conversion aims to change the emotional state of an utterance while preserving the linguistic content and speaker identity.
We propose a novel 2-stage training strategy for sequence-to-sequence emotional voice conversion with a limited amount of emotional speech data.
The proposed framework can perform both spectrum and prosody conversion and achieves significant improvement over the state-of-the-art baselines in both objective and subjective evaluation.
arXiv Detail & Related papers (2021-03-31T04:56:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.