Feature Selection Enhancement and Feature Space Visualization for
Speech-Based Emotion Recognition
- URL: http://arxiv.org/abs/2208.09269v1
- Date: Fri, 19 Aug 2022 11:29:03 GMT
- Title: Feature Selection Enhancement and Feature Space Visualization for
Speech-Based Emotion Recognition
- Authors: Sofia Kanwal, Sohail Asghar, Hazrat Ali
- Abstract summary: We present speech features enhancement strategy that improves speech emotion recognition.
The strategy is compared with the state-of-the-art methods used in the literature.
Our method achieved an average recognition gain of 11.5% for six out of seven emotions for the EMO-DB dataset, and 13.8% for seven out of eight emotions for the RAVDESS dataset.
- Score: 2.223733768286313
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Robust speech emotion recognition relies on the quality of the speech
features. We present speech features enhancement strategy that improves speech
emotion recognition. We used the INTERSPEECH 2010 challenge feature-set. We
identified subsets from the features set and applied Principle Component
Analysis to the subsets. Finally, the features are fused horizontally. The
resulting feature set is analyzed using t-distributed neighbour embeddings
(t-SNE) before the application of features for emotion recognition. The method
is compared with the state-of-the-art methods used in the literature. The
empirical evidence is drawn using two well-known datasets: Emotional Speech
Dataset (EMO-DB) and Ryerson Audio-Visual Database of Emotional Speech and Song
(RAVDESS) for two languages, German and English, respectively. Our method
achieved an average recognition gain of 11.5\% for six out of seven emotions
for the EMO-DB dataset, and 13.8\% for seven out of eight emotions for the
RAVDESS dataset as compared to the baseline study.
Related papers
- Improving Speech-based Emotion Recognition with Contextual Utterance Analysis and LLMs [2.8728982844941178]
Speech Emotion Recognition (SER) focuses on identifying emotional states from spoken language.
We propose a novel approach that first refines all available transcriptions to ensure data reliability.
We then segment each complete conversation into smaller dialogues and use these dialogues as context to predict the emotion of the target utterance within the dialogue.
arXiv Detail & Related papers (2024-10-27T04:23:34Z) - Layer-Wise Analysis of Self-Supervised Acoustic Word Embeddings: A Study
on Speech Emotion Recognition [54.952250732643115]
We study Acoustic Word Embeddings (AWEs), a fixed-length feature derived from continuous representations, to explore their advantages in specific tasks.
AWEs have previously shown utility in capturing acoustic discriminability.
Our findings underscore the acoustic context conveyed by AWEs and showcase the highly competitive Speech Emotion Recognition accuracies.
arXiv Detail & Related papers (2024-02-04T21:24:54Z) - Emotion Rendering for Conversational Speech Synthesis with Heterogeneous
Graph-Based Context Modeling [50.99252242917458]
Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting.
To address the issue of data scarcity, we meticulously create emotional labels in terms of category and intensity.
Our model outperforms the baseline models in understanding and rendering emotions.
arXiv Detail & Related papers (2023-12-19T08:47:50Z) - Speech and Text-Based Emotion Recognizer [0.9168634432094885]
We build a balanced corpus from publicly available datasets for speech emotion recognition.
Our best system, a multi-modal speech, and text-based model, provides a performance of UA(Unweighed Accuracy) + WA (Weighed Accuracy) of 157.57 compared to the baseline algorithm performance of 119.66.
arXiv Detail & Related papers (2023-12-10T05:17:39Z) - Exploring Emotion Expression Recognition in Older Adults Interacting
with a Virtual Coach [22.00225071959289]
EMPATHIC project aimed to design an emotionally expressive virtual coach capable of engaging healthy seniors to improve well-being and promote independent aging.
This paper outlines the development of the emotion expression recognition module of the virtual coach, encompassing data collection, annotation design, and a first methodological approach.
arXiv Detail & Related papers (2023-11-09T18:22:32Z) - FAF: A novel multimodal emotion recognition approach integrating face,
body and text [13.485538135494153]
We develop a large multimodal emotion dataset, named "HED" dataset, to facilitate the emotion recognition task.
To promote recognition accuracy, "Feature After Feature" framework was used to explore crucial emotional information.
We employ various benchmarks to evaluate the "HED" dataset and compare the performance with our method.
arXiv Detail & Related papers (2022-11-20T14:43:36Z) - Multimodal Emotion Recognition using Transfer Learning from Speaker
Recognition and BERT-based models [53.31917090073727]
We propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities.
We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture dataset.
arXiv Detail & Related papers (2022-02-16T00:23:42Z) - An Attribute-Aligned Strategy for Learning Speech Representation [57.891727280493015]
We propose an attribute-aligned learning strategy to derive speech representation that can flexibly address these issues by attribute-selection mechanism.
Specifically, we propose a layered-representation variational autoencoder (LR-VAE), which factorizes speech representation into attribute-sensitive nodes.
Our proposed method achieves competitive performances on identity-free SER and a better performance on emotionless SV.
arXiv Detail & Related papers (2021-06-05T06:19:14Z) - Reinforcement Learning for Emotional Text-to-Speech Synthesis with
Improved Emotion Discriminability [82.39099867188547]
Emotional text-to-speech synthesis (ETTS) has seen much progress in recent years.
We propose a new interactive training paradigm for ETTS, denoted as i-ETTS.
We formulate an iterative training strategy with reinforcement learning to ensure the quality of i-ETTS optimization.
arXiv Detail & Related papers (2021-04-03T13:52:47Z) - Exploring Emotion Features and Fusion Strategies for Audio-Video Emotion
Recognition [62.48806555665122]
We describe our approaches in EmotiW 2019, which mainly explores emotion features and feature fusion strategies for audio and visual modality.
With careful evaluation, we obtain 65.5% on the AFEW validation set and 62.48% on the test set and rank third in the challenge.
arXiv Detail & Related papers (2020-12-27T10:50:24Z) - Embedded Emotions -- A Data Driven Approach to Learn Transferable
Feature Representations from Raw Speech Input for Emotion Recognition [1.4556324908347602]
We investigate the applicability of transferring knowledge learned from large text and audio corpora to the task of automatic emotion recognition.
Our results show that the learned feature representations can be effectively applied for classifying emotions from spoken language.
arXiv Detail & Related papers (2020-09-30T09:18:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.