Related papers: CAMEO: Collection of Multilingual Emotional Speech Corpora

CAMEO: Collection of Multilingual Emotional Speech Corpora

URL: http://arxiv.org/abs/2505.11051v1
Date: Fri, 16 May 2025 09:52:00 GMT
Title: CAMEO: Collection of Multilingual Emotional Speech Corpora
Authors: Iwona Christop, Maciej Czajka,
Abstract summary: This paper presents a collection of multilingual emotional speech datasets designed to facilitate research in emotion recognition and other speech-related tasks.<n>The main objectives were to ensure easy access to the data, to allow normalization of the results, and to provide a standardized benchmark for evaluating speech emotion recognition systems.<n>The collection, along with metadata, and a leaderboard, is publicly available via the Hugging Face platform.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper presents CAMEO -- a curated collection of multilingual emotional speech datasets designed to facilitate research in emotion recognition and other speech-related tasks. The main objectives were to ensure easy access to the data, to allow reproducibility of the results, and to provide a standardized benchmark for evaluating speech emotion recognition (SER) systems across different emotional states and languages. The paper describes the dataset selection criteria, the curation and normalization process, and provides performance results for several models. The collection, along with metadata, and a leaderboard, is publicly available via the Hugging Face platform.

Related papers

BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages [93.92804151830744]
We present BRIGHTER -- a collection of multi-labeled datasets in 28 different languages.<n>We describe the data collection and annotation processes and the challenges of building these datasets.<n>We show that BRIGHTER datasets are a step towards bridging the gap in text-based emotion recognition.
arXiv Detail & Related papers (2025-02-17T15:39:50Z)
Leveraging Cross-Attention Transformer and Multi-Feature Fusion for Cross-Linguistic Speech Emotion Recognition [60.58049741496505]
Speech Emotion Recognition (SER) plays a crucial role in enhancing human-computer interaction.<n>We propose a novel approach HuMP-CAT, which combines HuBERT, MFCC, and prosodic characteristics.<n>We show that, by fine-tuning the source model with a small portion of speech from the target datasets, HuMP-CAT achieves an average accuracy of 78.75%.
arXiv Detail & Related papers (2025-01-06T14:31:25Z)
LIMIS: Towards Language-based Interactive Medical Image Segmentation [58.553786162527686]
LIMIS is the first purely language-based interactive medical image segmentation model. We adapt Grounded SAM to the medical domain and design a language-based model interaction strategy. We evaluate LIMIS on three publicly available medical datasets in terms of performance and usability.
arXiv Detail & Related papers (2024-10-22T12:13:47Z)
Fusion approaches for emotion recognition from speech using acoustic and text-based features [15.186937600119897]
We study different approaches for classifying emotions from speech using acoustic and text-based features. We compare strategies to combine the audio and text modalities, evaluating them on IEMOCAP and MSP-PODCAST datasets. For IEMOCAP, we show the large effect that the criteria used to define the cross-validation folds have on results.
arXiv Detail & Related papers (2024-03-27T14:40:25Z)
SER_AMPEL: a multi-source dataset for speech emotion recognition of Italian older adults [58.49386651361823]
SER_AMPEL is a multi-source dataset for speech emotion recognition (SER) It is collected with the aim of providing a reference for speech emotion recognition in case of Italian older adults. The evidence of the need for such a dataset emerges from the analysis of the state of the art.
arXiv Detail & Related papers (2023-11-24T13:47:25Z)
CLARA: Multilingual Contrastive Learning for Audio Representation Acquisition [5.520654376217889]
CLARA minimizes reliance on labelled data, enhancing generalization across languages. Our approach adeptly captures emotional nuances in speech, overcoming subjective assessment issues. It adapts to low-resource languages, marking progress in multilingual speech representation learning.
arXiv Detail & Related papers (2023-10-18T09:31:56Z)
EMNS /Imz/ Corpus: An emotive single-speaker dataset for narrative storytelling in games, television and graphic novels [6.2375553155844266]
The Emotive Narrative Storytelling (EMNS) corpus is a unique speech dataset created to enhance conversations' emotive quality. It consists of a 2.3-hour recording featuring a female speaker delivering labelled utterances. It encompasses eight acted emotional states, evenly distributed with a variance of 0.68%, along with expressiveness levels and natural language descriptions with word emphasis labels.
arXiv Detail & Related papers (2023-05-22T15:32:32Z)
Feature Selection Enhancement and Feature Space Visualization for Speech-Based Emotion Recognition [2.223733768286313]
We present speech features enhancement strategy that improves speech emotion recognition. The strategy is compared with the state-of-the-art methods used in the literature. Our method achieved an average recognition gain of 11.5% for six out of seven emotions for the EMO-DB dataset, and 13.8% for seven out of eight emotions for the RAVDESS dataset.
arXiv Detail & Related papers (2022-08-19T11:29:03Z)
Self-Supervised Speech Representation Learning: A Review [105.1545308184483]
Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains. Speech representation learning is experiencing similar progress in three main categories: generative, contrastive, and predictive methods. This review presents approaches for self-supervised speech representation learning and their connection to other research areas.
arXiv Detail & Related papers (2022-05-21T16:52:57Z)
XTREME-S: Evaluating Cross-lingual Speech Representations [88.78720838743772]
XTREME-S is a new benchmark to evaluate universal cross-lingual speech representations in many languages. This paper describes the new benchmark and establishes the first speech-only and speech-text baselines.
arXiv Detail & Related papers (2022-03-21T06:50:21Z)
POSSCORE: A Simple Yet Effective Evaluation of Conversational Search with Part of Speech Labelling [25.477834359694473]
Conversational search systems, such as Google Assistant and Microsoft Cortana, provide a new search paradigm where users are allowed, via natural language dialogues, to communicate with search systems. We propose POSSCORE, a simple yet effective automatic evaluation method for conversational search. We show that our metrics can correlate with human preference, achieving significant improvements over state-of-the-art baseline metrics.
arXiv Detail & Related papers (2021-09-07T12:31:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.