EmoTale: An Enacted Speech-emotion Dataset in Danish
- URL: http://arxiv.org/abs/2508.14548v1
- Date: Wed, 20 Aug 2025 09:01:54 GMT
- Title: EmoTale: An Enacted Speech-emotion Dataset in Danish
- Authors: Maja J. Hjuler, Harald V. Skat-Rørdam, Line H. Clemmensen, Sneha Das,
- Abstract summary: EmoTale is a corpus of Danish and English speech recordings with their associated enacted emotion annotations.<n>We develop SER models for EmoTale and the reference datasets using self-supervised speech model embeddings and the openSMILE feature extractor.<n>The best model achieves an unweighted average recall (UAR) of 64.1% on the EmoTale corpus.
- Score: 4.228593407506635
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While multiple emotional speech corpora exist for commonly spoken languages, there is a lack of functional datasets for smaller (spoken) languages, such as Danish. To our knowledge, Danish Emotional Speech (DES), published in 1997, is the only other database of Danish emotional speech. We present EmoTale; a corpus comprising Danish and English speech recordings with their associated enacted emotion annotations. We demonstrate the validity of the dataset by investigating and presenting its predictive power using speech emotion recognition (SER) models. We develop SER models for EmoTale and the reference datasets using self-supervised speech model (SSLM) embeddings and the openSMILE feature extractor. We find the embeddings superior to the hand-crafted features. The best model achieves an unweighted average recall (UAR) of 64.1% on the EmoTale corpus using leave-one-speaker-out cross-validation, comparable to the performance on DES.
Related papers
- EmoBench-UA: A Benchmark Dataset for Emotion Detection in Ukrainian [55.08460390792863]
EmoBench-UA is the first annotated dataset for emotion detection in Ukrainian texts.<n>Our findings highlight the challenges of emotion classification in non-mainstream languages like Ukrainian.
arXiv Detail & Related papers (2025-05-29T09:49:57Z) - Emo Pillars: Knowledge Distillation to Support Fine-Grained Context-Aware and Context-Less Emotion Classification [56.974545305472304]
Most datasets for sentiment analysis lack context in which an opinion was expressed, often crucial for emotion understanding, and are mainly limited by a few emotion categories.<n>We design an LLM-based data synthesis pipeline and leverage a large model, Mistral-7b, for the generation of training examples for more accessible, lightweight BERT-type encoder models.<n>We show that Emo Pillars models are highly adaptive to new domains when tuned to specific tasks such as GoEmotions, ISEAR, IEMOCAP, and EmoContext, reaching the SOTA performance on the first three.
arXiv Detail & Related papers (2025-04-23T16:23:17Z) - EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting [48.56693150755667]
We propose EmoVoice, a novel emotion-controllable TTS model.<n>EmoVoice exploits large language models (LLMs) to enable fine-grained freestyle natural language emotion control.<n>EmoVoice achieves state-of-the-art performance on the English EmoVoice-DB test set.
arXiv Detail & Related papers (2025-04-17T11:50:04Z) - Leveraging Cross-Attention Transformer and Multi-Feature Fusion for Cross-Linguistic Speech Emotion Recognition [60.58049741496505]
Speech Emotion Recognition (SER) plays a crucial role in enhancing human-computer interaction.<n>We propose a novel approach HuMP-CAT, which combines HuBERT, MFCC, and prosodic characteristics.<n>We show that, by fine-tuning the source model with a small portion of speech from the target datasets, HuMP-CAT achieves an average accuracy of 78.75%.
arXiv Detail & Related papers (2025-01-06T14:31:25Z) - MELD-ST: An Emotion-aware Speech Translation Dataset [29.650945917540316]
We present the MELD-ST dataset for the emotion-aware speech translation task, comprising English-to-Japanese and English-to-German language pairs.
Each language pair includes about 10,000 utterances annotated with emotion labels from the MELD dataset.
Baseline experiments using the SeamlessM4T model on the dataset indicate that fine-tuning with emotion labels can enhance translation performance in some settings.
arXiv Detail & Related papers (2024-05-21T22:40:38Z) - nEMO: Dataset of Emotional Speech in Polish [0.0]
nEMO is a novel corpus of emotional speech in Polish.
The dataset comprises over 3 hours of samples recorded with the participation of nine actors portraying six emotional states.
The text material used was carefully selected to represent the phonetics of the Polish language adequately.
arXiv Detail & Related papers (2024-04-09T13:18:52Z) - EMNS /Imz/ Corpus: An emotive single-speaker dataset for narrative
storytelling in games, television and graphic novels [6.2375553155844266]
The Emotive Narrative Storytelling (EMNS) corpus is a unique speech dataset created to enhance conversations' emotive quality.
It consists of a 2.3-hour recording featuring a female speaker delivering labelled utterances.
It encompasses eight acted emotional states, evenly distributed with a variance of 0.68%, along with expressiveness levels and natural language descriptions with word emphasis labels.
arXiv Detail & Related papers (2023-05-22T15:32:32Z) - Feature Selection Enhancement and Feature Space Visualization for
Speech-Based Emotion Recognition [2.223733768286313]
We present speech features enhancement strategy that improves speech emotion recognition.
The strategy is compared with the state-of-the-art methods used in the literature.
Our method achieved an average recognition gain of 11.5% for six out of seven emotions for the EMO-DB dataset, and 13.8% for seven out of eight emotions for the RAVDESS dataset.
arXiv Detail & Related papers (2022-08-19T11:29:03Z) - EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional
Text-to-Speech Model [56.75775793011719]
We introduce and publicly release a Mandarin emotion speech dataset including 9,724 samples with audio files and its emotion human-labeled annotation.
Unlike those models which need additional reference audio as input, our model could predict emotion labels just from the input text and generate more expressive speech conditioned on the emotion embedding.
In the experiment phase, we first validate the effectiveness of our dataset by an emotion classification task. Then we train our model on the proposed dataset and conduct a series of subjective evaluations.
arXiv Detail & Related papers (2021-06-17T08:34:21Z) - Seen and Unseen emotional style transfer for voice conversion with a new
emotional speech dataset [84.53659233967225]
Emotional voice conversion aims to transform emotional prosody in speech while preserving the linguistic content and speaker identity.
We propose a novel framework based on variational auto-encoding Wasserstein generative adversarial network (VAW-GAN)
We show that the proposed framework achieves remarkable performance by consistently outperforming the baseline framework.
arXiv Detail & Related papers (2020-10-28T07:16:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.