Designing and Evaluating Speech Emotion Recognition Systems: A reality
check case study with IEMOCAP
- URL: http://arxiv.org/abs/2304.00860v1
- Date: Mon, 3 Apr 2023 10:16:24 GMT
- Title: Designing and Evaluating Speech Emotion Recognition Systems: A reality
check case study with IEMOCAP
- Authors: Nikolaos Antoniou and Athanasios Katsamanis and Theodoros
Giannakopoulos and Shrikanth Narayanan
- Abstract summary: There is an imminent need for guidelines and standard test sets to allow direct and fair comparisons of speech emotion recognition (SER)
resources, such as the Interactive Emotional Dyadic Motion Capture (IEMOCAP) database, have emerged as widely-adopted reference corpora for researchers to develop and test models for SER.
- Score: 33.199425144083925
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There is an imminent need for guidelines and standard test sets to allow
direct and fair comparisons of speech emotion recognition (SER). While
resources, such as the Interactive Emotional Dyadic Motion Capture (IEMOCAP)
database, have emerged as widely-adopted reference corpora for researchers to
develop and test models for SER, published work reveals a wide range of
assumptions and variety in its use that challenge reproducibility and
generalization. Based on a critical review of the latest advances in SER using
IEMOCAP as the use case, our work aims at two contributions: First, using an
analysis of the recent literature, including assumptions made and metrics used
therein, we provide a set of SER evaluation guidelines. Second, using recent
publications with open-sourced implementations, we focus on reproducibility
assessment in SER.
Related papers
- LibEER: A Comprehensive Benchmark and Algorithm Library for EEG-based Emotion Recognition [31.383215932044408]
EEG-based emotion recognition (EER) has gained significant attention due to its potential for understanding and analyzing human emotions.
The field lacks a convincing benchmark and comprehensive open-source libraries.
We introduce LibEER, a comprehensive benchmark and algorithm library designed to facilitate fair comparisons in EER.
arXiv Detail & Related papers (2024-10-13T07:51:39Z) - SER Evals: In-domain and Out-of-domain Benchmarking for Speech Emotion Recognition [3.4355593397388597]
Speech emotion recognition (SER) has made significant strides with the advent of powerful self-supervised learning (SSL) models.
We propose a large-scale benchmark to evaluate the robustness and adaptability of state-of-the-art SER models.
We find that the Whisper model, primarily designed for automatic speech recognition, outperforms dedicated SSL models in cross-lingual SER.
arXiv Detail & Related papers (2024-08-14T23:33:10Z) - Layer-Wise Analysis of Self-Supervised Acoustic Word Embeddings: A Study
on Speech Emotion Recognition [54.952250732643115]
We study Acoustic Word Embeddings (AWEs), a fixed-length feature derived from continuous representations, to explore their advantages in specific tasks.
AWEs have previously shown utility in capturing acoustic discriminability.
Our findings underscore the acoustic context conveyed by AWEs and showcase the highly competitive Speech Emotion Recognition accuracies.
arXiv Detail & Related papers (2024-02-04T21:24:54Z) - PROXYQA: An Alternative Framework for Evaluating Long-Form Text Generation with Large Language Models [72.57329554067195]
ProxyQA is an innovative framework dedicated to assessing longtext generation.
It comprises in-depth human-curated meta-questions spanning various domains, each accompanied by specific proxy-questions with pre-annotated answers.
It assesses the generated content's quality through the evaluator's accuracy in addressing the proxy-questions.
arXiv Detail & Related papers (2024-01-26T18:12:25Z) - EvalLM: Interactive Evaluation of Large Language Model Prompts on
User-Defined Criteria [43.944632774725484]
We present EvalLM, an interactive system for iteratively refining prompts by evaluating multiple outputs on user-defined criteria.
By describing criteria in natural language, users can employ the system's LLM-based evaluator to get an overview of where prompts excel or fail.
A comparative study showed that EvalLM, when compared to manual evaluation, helped participants compose more diverse criteria, examine twice as many outputs, and reach satisfactory prompts with 59% fewer revisions.
arXiv Detail & Related papers (2023-09-24T13:19:38Z) - Emotion-Cause Pair Extraction in Customer Reviews [3.561118125328526]
We aim to present our work in ECPE in the domain of online reviews.
With a manually annotated dataset, we explore an algorithm to extract emotion cause pairs using a neural network.
We propose a model using previous reference materials and combining emotion-cause pair extraction with research in the domain of emotion-aware word embeddings.
arXiv Detail & Related papers (2021-12-07T20:56:20Z) - An Exploration of Self-Supervised Pretrained Representations for
End-to-End Speech Recognition [98.70304981174748]
We focus on the general applications of pretrained speech representations, on advanced end-to-end automatic speech recognition (E2E-ASR) models.
We select several pretrained speech representations and present the experimental results on various open-source and publicly available corpora for E2E-ASR.
arXiv Detail & Related papers (2021-10-09T15:06:09Z) - RADDLE: An Evaluation Benchmark and Analysis Platform for Robust
Task-oriented Dialog Systems [75.87418236410296]
We introduce the RADDLE benchmark, a collection of corpora and tools for evaluating the performance of models across a diverse set of domains.
RADDLE is designed to favor and encourage models with a strong generalization ability.
We evaluate recent state-of-the-art systems based on pre-training and fine-tuning, and find that grounded pre-training on heterogeneous dialog corpora performs better than training a separate model per domain.
arXiv Detail & Related papers (2020-12-29T08:58:49Z) - Mining Implicit Relevance Feedback from User Behavior for Web Question
Answering [92.45607094299181]
We make the first study to explore the correlation between user behavior and passage relevance.
Our approach significantly improves the accuracy of passage ranking without extra human labeled data.
In practice, this work has proved effective to substantially reduce the human labeling cost for the QA service in a global commercial search engine.
arXiv Detail & Related papers (2020-06-13T07:02:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.