INTERSPEECH 2009 Emotion Challenge Revisited: Benchmarking 15 Years of Progress in Speech Emotion Recognition
- URL: http://arxiv.org/abs/2406.06401v1
- Date: Mon, 10 Jun 2024 15:55:06 GMT
- Title: INTERSPEECH 2009 Emotion Challenge Revisited: Benchmarking 15 Years of Progress in Speech Emotion Recognition
- Authors: Andreas Triantafyllopoulos, Anton Batliner, Simon Rampp, Manuel Milling, Björn Schuller,
- Abstract summary: We revisit the INTERSPEECH 2009 Emotion Challenge -- the first ever speech emotion recognition (SER) challenge.
We evaluate a series of deep learning models that are representative of the major advances in SER research.
- Score: 5.303788012608604
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We revisit the INTERSPEECH 2009 Emotion Challenge -- the first ever speech emotion recognition (SER) challenge -- and evaluate a series of deep learning models that are representative of the major advances in SER research in the time since then. We start by training each model using a fixed set of hyperparameters, and further fine-tune the best-performing models of that initial setup with a grid search. Results are always reported on the official test set with a separate validation set only used for early stopping. Most models score below or close to the official baseline, while they marginally outperform the original challenge winners after hyperparameter tuning. Our work illustrates that, despite recent progress, FAU-AIBO remains a very challenging benchmark. An interesting corollary is that newer methods do not consistently outperform older ones, showing that progress towards `solving' SER is not necessarily monotonic.
Related papers
- Beyond human subjectivity and error: a novel AI grading system [67.410870290301]
The grading of open-ended questions is a high-effort, high-impact task in education.
Recent breakthroughs in AI technology might facilitate such automation, but this has not been demonstrated at scale.
We introduce a novel automatic short answer grading (ASAG) system.
arXiv Detail & Related papers (2024-05-07T13:49:59Z) - Leveraging TCN and Transformer for effective visual-audio fusion in
continuous emotion recognition [0.5370906227996627]
We present our approach to the Valence-Arousal (VA) Estimation Challenge, Expression (Expr) Classification Challenge, and Action Unit (AU) Detection Challenge.
We propose a novel multi-modal fusion model that leverages Temporal Convolutional Networks (TCN) and Transformer to enhance the performance of continuous emotion recognition.
arXiv Detail & Related papers (2023-03-15T04:15:57Z) - Don't Copy the Teacher: Data and Model Challenges in Embodied Dialogue [92.01165203498299]
Embodied dialogue instruction following requires an agent to complete a complex sequence of tasks from a natural language exchange.
This paper argues that imitation learning (IL) and related low-level metrics are actually misleading and do not align with the goals of embodied dialogue research.
arXiv Detail & Related papers (2022-10-10T05:51:40Z) - An Overview & Analysis of Sequence-to-Sequence Emotional Voice
Conversion [8.94336505787464]
Sequence-to-sequence modelling is emerging as a competitive paradigm for models to overcome EVC challenges.
Recent sequence-to-sequence EVC papers were investigated and reviewed from six perspectives.
This information is organised to provide the research community with an easily digestible overview of the current state-of-the-art.
arXiv Detail & Related papers (2022-03-29T19:41:34Z) - SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark
for Semantic and Generative Capabilities [76.97949110580703]
We introduce SUPERB-SG, a new benchmark to evaluate pre-trained models across various speech tasks.
We use a lightweight methodology to test the robustness of representations learned by pre-trained models under shifts in data domain.
We also show that the task diversity of SUPERB-SG coupled with limited task supervision is an effective recipe for evaluating the generalizability of model representation.
arXiv Detail & Related papers (2022-03-14T04:26:40Z) - When Liebig's Barrel Meets Facial Landmark Detection: A Practical Model [87.25037167380522]
We propose a model that is accurate, robust, efficient, generalizable, and end-to-end trainable.
In order to achieve a better accuracy, we propose two lightweight modules.
DQInit dynamically initializes the queries of decoder from the inputs, enabling the model to achieve as good accuracy as the ones with multiple decoder layers.
QAMem is designed to enhance the discriminative ability of queries on low-resolution feature maps by assigning separate memory values to each query rather than a shared one.
arXiv Detail & Related papers (2021-05-27T13:51:42Z) - Utilizing Self-supervised Representations for MOS Prediction [51.09985767946843]
Existing evaluations usually require clean references or parallel ground truth data.
Subjective tests, on the other hand, do not need any additional clean or parallel data and correlates better to human perception.
We develop an automatic evaluation approach that correlates well with human perception while not requiring ground truth data.
arXiv Detail & Related papers (2021-04-07T09:44:36Z) - A Brief Survey and Comparative Study of Recent Development of Pronoun
Coreference Resolution [55.39835612617972]
Pronoun Coreference Resolution (PCR) is the task of resolving pronominal expressions to all mentions they refer to.
As one important natural language understanding (NLU) component, pronoun resolution is crucial for many downstream tasks and still challenging for existing models.
We conduct extensive experiments to show that even though current models are achieving good performance on the standard evaluation set, they are still not ready to be used in real applications.
arXiv Detail & Related papers (2020-09-27T01:40:01Z) - TAL EmotioNet Challenge 2020 Rethinking the Model Chosen Problem in
Multi-Task Learning [24.365090805937083]
We pose the AU recognition problem as a multi-task learning problem.
The co-occurrence of the expression features and the head pose features are explored.
By choosing the optimal checkpoint for each AU, the recognition results are improved.
arXiv Detail & Related papers (2020-04-21T09:39:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.