Emotional Expression Detection in Spoken Language Employing Machine
Learning Algorithms
- URL: http://arxiv.org/abs/2304.11040v1
- Date: Thu, 20 Apr 2023 17:57:08 GMT
- Title: Emotional Expression Detection in Spoken Language Employing Machine
Learning Algorithms
- Authors: Mehrab Hosain, Most. Yeasmin Arafat, Gazi Zahirul Islam, Jia Uddin,
Md. Mobarak Hossain, Fatema Alam
- Abstract summary: There are a variety of features of the human voice that can be classified as pitch, timbre, loudness, and vocal tone.
It is observed in numerous incidents that human expresses their feelings using different vocal qualities when they are speaking.
The primary objective of this research is to recognize different emotions of human beings by using several functions namely, spectral descriptors, periodicity, and harmonicity.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There are a variety of features of the human voice that can be classified as
pitch, timbre, loudness, and vocal tone. It is observed in numerous incidents
that human expresses their feelings using different vocal qualities when they
are speaking. The primary objective of this research is to recognize different
emotions of human beings such as anger, sadness, fear, neutrality, disgust,
pleasant surprise, and happiness by using several MATLAB functions namely,
spectral descriptors, periodicity, and harmonicity. To accomplish the work, we
analyze the CREMA-D (Crowd-sourced Emotional Multimodal Actors Data) & TESS
(Toronto Emotional Speech Set) datasets of human speech. The audio file
contains data that have various characteristics (e.g., noisy, speedy, slow)
thereby the efficiency of the ML (Machine Learning) models increases
significantly. The EMD (Empirical Mode Decomposition) is utilized for the
process of signal decomposition. Then, the features are extracted through the
use of several techniques such as the MFCC, GTCC, spectral centroid, roll-off
point, entropy, spread, flux, harmonic ratio, energy, skewness, flatness, and
audio delta. The data is trained using some renowned ML models namely, Support
Vector Machine, Neural Network, Ensemble, and KNN. The algorithms show an
accuracy of 67.7%, 63.3%, 61.6%, and 59.0% respectively for the test data and
77.7%, 76.1%, 99.1%, and 61.2% for the training data. We have conducted
experiments using Matlab and the result shows that our model is very prominent
and flexible than existing similar works.
Related papers
- Speech Emotion Detection Based on MFCC and CNN-LSTM Architecture [0.0]
This paper processes the initial audio input into waveplot and spectrum for analysis and concentrates on multiple features including MFCC as targets for feature extraction.
The architecture achieved an accuracy of 61.07% comprehensively for the test set, among which the detection of anger and neutral reaches a performance of 75.31% and 71.70% respectively.
arXiv Detail & Related papers (2025-01-18T06:15:54Z) - Leveraging Cross-Attention Transformer and Multi-Feature Fusion for Cross-Linguistic Speech Emotion Recognition [60.58049741496505]
Speech Emotion Recognition (SER) plays a crucial role in enhancing human-computer interaction.
We propose a novel approach HuMP-CAT, which combines HuBERT, MFCC, and prosodic characteristics.
We show that, by fine-tuning the source model with a small portion of speech from the target datasets, HuMP-CAT achieves an average accuracy of 78.75%.
arXiv Detail & Related papers (2025-01-06T14:31:25Z) - Speaker Emotion Recognition: Leveraging Self-Supervised Models for Feature Extraction Using Wav2Vec2 and HuBERT [0.0]
We study the use of self-supervised transformer-based models, Wav2Vec2 and HuBERT, to determine the emotion of speakers from their voice.
The proposed solution is evaluated on reputable datasets, including RAVDESS, SHEMO, SAVEE, AESDD, and Emo-DB.
arXiv Detail & Related papers (2024-11-05T10:06:40Z) - Speech and Text-Based Emotion Recognizer [0.9168634432094885]
We build a balanced corpus from publicly available datasets for speech emotion recognition.
Our best system, a multi-modal speech, and text-based model, provides a performance of UA(Unweighed Accuracy) + WA (Weighed Accuracy) of 157.57 compared to the baseline algorithm performance of 119.66.
arXiv Detail & Related papers (2023-12-10T05:17:39Z) - EmoDiarize: Speaker Diarization and Emotion Identification from Speech
Signals using Convolutional Neural Networks [0.0]
This research explores the integration of deep learning techniques in speech emotion recognition.
It introduces a framework that combines a pre-existing speaker diarization pipeline and an emotion identification model built on a Convolutional Neural Network (CNN)
The proposed model yields an unweighted accuracy of 63%, demonstrating remarkable efficiency in accurately identifying emotional states within speech signals.
arXiv Detail & Related papers (2023-10-19T16:02:53Z) - Multimodal Emotion Recognition using Transfer Learning from Speaker
Recognition and BERT-based models [53.31917090073727]
We propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities.
We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture dataset.
arXiv Detail & Related papers (2022-02-16T00:23:42Z) - Towards Language Modelling in the Speech Domain Using Sub-word
Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes.
With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech.
We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z) - EEGminer: Discovering Interpretable Features of Brain Activity with
Learnable Filters [72.19032452642728]
We propose a novel differentiable EEG decoding pipeline consisting of learnable filters and a pre-determined feature extraction module.
We demonstrate the utility of our model towards emotion recognition from EEG signals on the SEED dataset and on a new EEG dataset of unprecedented size.
The discovered features align with previous neuroscience studies and offer new insights, such as marked differences in the functional connectivity profile between left and right temporal areas during music listening.
arXiv Detail & Related papers (2021-10-19T14:22:04Z) - Improved Speech Emotion Recognition using Transfer Learning and
Spectrogram Augmentation [56.264157127549446]
Speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction.
One of the main challenges in SER is data scarcity.
We propose a transfer learning strategy combined with spectrogram augmentation.
arXiv Detail & Related papers (2021-08-05T10:39:39Z) - EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional
Text-to-Speech Model [56.75775793011719]
We introduce and publicly release a Mandarin emotion speech dataset including 9,724 samples with audio files and its emotion human-labeled annotation.
Unlike those models which need additional reference audio as input, our model could predict emotion labels just from the input text and generate more expressive speech conditioned on the emotion embedding.
In the experiment phase, we first validate the effectiveness of our dataset by an emotion classification task. Then we train our model on the proposed dataset and conduct a series of subjective evaluations.
arXiv Detail & Related papers (2021-06-17T08:34:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.