Emotional Expression Detection in Spoken Language Employing Machine
Learning Algorithms
- URL: http://arxiv.org/abs/2304.11040v1
- Date: Thu, 20 Apr 2023 17:57:08 GMT
- Title: Emotional Expression Detection in Spoken Language Employing Machine
Learning Algorithms
- Authors: Mehrab Hosain, Most. Yeasmin Arafat, Gazi Zahirul Islam, Jia Uddin,
Md. Mobarak Hossain, Fatema Alam
- Abstract summary: There are a variety of features of the human voice that can be classified as pitch, timbre, loudness, and vocal tone.
It is observed in numerous incidents that human expresses their feelings using different vocal qualities when they are speaking.
The primary objective of this research is to recognize different emotions of human beings by using several functions namely, spectral descriptors, periodicity, and harmonicity.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There are a variety of features of the human voice that can be classified as
pitch, timbre, loudness, and vocal tone. It is observed in numerous incidents
that human expresses their feelings using different vocal qualities when they
are speaking. The primary objective of this research is to recognize different
emotions of human beings such as anger, sadness, fear, neutrality, disgust,
pleasant surprise, and happiness by using several MATLAB functions namely,
spectral descriptors, periodicity, and harmonicity. To accomplish the work, we
analyze the CREMA-D (Crowd-sourced Emotional Multimodal Actors Data) & TESS
(Toronto Emotional Speech Set) datasets of human speech. The audio file
contains data that have various characteristics (e.g., noisy, speedy, slow)
thereby the efficiency of the ML (Machine Learning) models increases
significantly. The EMD (Empirical Mode Decomposition) is utilized for the
process of signal decomposition. Then, the features are extracted through the
use of several techniques such as the MFCC, GTCC, spectral centroid, roll-off
point, entropy, spread, flux, harmonic ratio, energy, skewness, flatness, and
audio delta. The data is trained using some renowned ML models namely, Support
Vector Machine, Neural Network, Ensemble, and KNN. The algorithms show an
accuracy of 67.7%, 63.3%, 61.6%, and 59.0% respectively for the test data and
77.7%, 76.1%, 99.1%, and 61.2% for the training data. We have conducted
experiments using Matlab and the result shows that our model is very prominent
and flexible than existing similar works.
Related papers
- Leveraged Mel spectrograms using Harmonic and Percussive Components in
Speech Emotion Recognition [15.919990281329085]
This work explores the effects of the harmonic and percussive components of Mel spectrograms in Speech Emotion Recognition (SER)
We attempt to leverage the Mel spectrogram by decomposing distinguishable acoustic features for exploitation in our proposed architecture.
This study specifically focuses on effective data augmentation techniques for building an enriched hybrid-based feature map.
arXiv Detail & Related papers (2023-12-18T05:55:46Z) - Speech and Text-Based Emotion Recognizer [0.9168634432094885]
We build a balanced corpus from publicly available datasets for speech emotion recognition.
Our best system, a multi-modal speech, and text-based model, provides a performance of UA(Unweighed Accuracy) + WA (Weighed Accuracy) of 157.57 compared to the baseline algorithm performance of 119.66.
arXiv Detail & Related papers (2023-12-10T05:17:39Z) - EmoDiarize: Speaker Diarization and Emotion Identification from Speech
Signals using Convolutional Neural Networks [0.0]
This research explores the integration of deep learning techniques in speech emotion recognition.
It introduces a framework that combines a pre-existing speaker diarization pipeline and an emotion identification model built on a Convolutional Neural Network (CNN)
The proposed model yields an unweighted accuracy of 63%, demonstrating remarkable efficiency in accurately identifying emotional states within speech signals.
arXiv Detail & Related papers (2023-10-19T16:02:53Z) - Multimodal Emotion Recognition with Modality-Pairwise Unsupervised
Contrastive Loss [80.79641247882012]
We focus on unsupervised feature learning for Multimodal Emotion Recognition (MER)
We consider discrete emotions, and as modalities text, audio and vision are used.
Our method, as being based on contrastive loss between pairwise modalities, is the first attempt in MER literature.
arXiv Detail & Related papers (2022-07-23T10:11:24Z) - M2FNet: Multi-modal Fusion Network for Emotion Recognition in
Conversation [1.3864478040954673]
We propose a Multi-modal Fusion Network (M2FNet) that extracts emotion-relevant features from visual, audio, and text modality.
It employs a multi-head attention-based fusion mechanism to combine emotion-rich latent representations of the input data.
The proposed feature extractor is trained with a novel adaptive margin-based triplet loss function to learn emotion-relevant features from the audio and visual data.
arXiv Detail & Related papers (2022-06-05T14:18:58Z) - BEAT: A Large-Scale Semantic and Emotional Multi-Modal Dataset for
Conversational Gestures Synthesis [9.95713767110021]
Body-Expression-Audio-Text dataset has i) 76 hours, high-quality, multi-modal data captured from 30 speakers talking with eight different emotions and in four different languages.
BEAT is the largest motion capture dataset for investigating the human gestures.
arXiv Detail & Related papers (2022-03-10T11:19:52Z) - Multimodal Emotion Recognition using Transfer Learning from Speaker
Recognition and BERT-based models [53.31917090073727]
We propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities.
We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture dataset.
arXiv Detail & Related papers (2022-02-16T00:23:42Z) - Towards Language Modelling in the Speech Domain Using Sub-word
Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes.
With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech.
We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z) - EEGminer: Discovering Interpretable Features of Brain Activity with
Learnable Filters [72.19032452642728]
We propose a novel differentiable EEG decoding pipeline consisting of learnable filters and a pre-determined feature extraction module.
We demonstrate the utility of our model towards emotion recognition from EEG signals on the SEED dataset and on a new EEG dataset of unprecedented size.
The discovered features align with previous neuroscience studies and offer new insights, such as marked differences in the functional connectivity profile between left and right temporal areas during music listening.
arXiv Detail & Related papers (2021-10-19T14:22:04Z) - Improved Speech Emotion Recognition using Transfer Learning and
Spectrogram Augmentation [56.264157127549446]
Speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction.
One of the main challenges in SER is data scarcity.
We propose a transfer learning strategy combined with spectrogram augmentation.
arXiv Detail & Related papers (2021-08-05T10:39:39Z) - EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional
Text-to-Speech Model [56.75775793011719]
We introduce and publicly release a Mandarin emotion speech dataset including 9,724 samples with audio files and its emotion human-labeled annotation.
Unlike those models which need additional reference audio as input, our model could predict emotion labels just from the input text and generate more expressive speech conditioned on the emotion embedding.
In the experiment phase, we first validate the effectiveness of our dataset by an emotion classification task. Then we train our model on the proposed dataset and conduct a series of subjective evaluations.
arXiv Detail & Related papers (2021-06-17T08:34:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.