Speech Emotion Recognition Using CNN and Its Use Case in Digital Healthcare
- URL: http://arxiv.org/abs/2406.10741v1
- Date: Sat, 15 Jun 2024 21:33:03 GMT
- Title: Speech Emotion Recognition Using CNN and Its Use Case in Digital Healthcare
- Authors: Nishargo Nigar,
- Abstract summary: The process of identifying human emotion and affective states from speech is known as speech emotion recognition (SER)
My research seeks to use the Convolutional Neural Network (CNN) to distinguish emotions from audio recordings and label them in accordance with the range of different emotions.
I have developed a machine learning model to identify emotions from supplied audio files with the aid of machine learning methods.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The process of identifying human emotion and affective states from speech is known as speech emotion recognition (SER). This is based on the observation that tone and pitch in the voice frequently convey underlying emotion. Speech recognition includes the ability to recognize emotions, which is becoming increasingly popular and in high demand. With the help of appropriate factors (such modalities, emotions, intensities, repetitions, etc.) found in the data, my research seeks to use the Convolutional Neural Network (CNN) to distinguish emotions from audio recordings and label them in accordance with the range of different emotions. I have developed a machine learning model to identify emotions from supplied audio files with the aid of machine learning methods. The evaluation is mostly focused on precision, recall, and F1 score, which are common machine learning metrics. To properly set up and train the machine learning framework, the main objective is to investigate the influence and cross-relation of all input and output parameters. To improve the ability to recognize intentions, a key condition for communication, I have evaluated emotions using my specialized machine learning algorithm via voice that would address the emotional state from voice with the help of digital healthcare, bridging the gap between human and artificial intelligence (AI).
Related papers
- Speaker Emotion Recognition: Leveraging Self-Supervised Models for Feature Extraction Using Wav2Vec2 and HuBERT [0.0]
We study the use of self-supervised transformer-based models, Wav2Vec2 and HuBERT, to determine the emotion of speakers from their voice.
The proposed solution is evaluated on reputable datasets, including RAVDESS, SHEMO, SAVEE, AESDD, and Emo-DB.
arXiv Detail & Related papers (2024-11-05T10:06:40Z) - Attention-based Interactive Disentangling Network for Instance-level
Emotional Voice Conversion [81.1492897350032]
Emotional Voice Conversion aims to manipulate a speech according to a given emotion while preserving non-emotion components.
We propose an Attention-based Interactive diseNtangling Network (AINN) that leverages instance-wise emotional knowledge for voice conversion.
arXiv Detail & Related papers (2023-12-29T08:06:45Z) - Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on
Data-Driven Deep Learning [70.30713251031052]
We propose a data-driven deep learning model, i.e. StrengthNet, to improve the generalization of emotion strength assessment for seen and unseen speech.
Experiments show that the predicted emotion strength of the proposed StrengthNet is highly correlated with ground truth scores for both seen and unseen speech.
arXiv Detail & Related papers (2022-06-15T01:25:32Z) - AHD ConvNet for Speech Emotion Classification [0.0]
We propose a novel mel spectrogram learning approach in which our model uses the datapoints to learn emotions from the given wav form voice notes in the popular CREMA-D dataset.
It took less training time compared to other approaches used to address the problem of emotion speech recognition.
arXiv Detail & Related papers (2022-06-10T11:57:28Z) - Multimodal Emotion Recognition using Transfer Learning from Speaker
Recognition and BERT-based models [53.31917090073727]
We propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities.
We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture dataset.
arXiv Detail & Related papers (2022-02-16T00:23:42Z) - Emotion Intensity and its Control for Emotional Voice Conversion [77.05097999561298]
Emotional voice conversion (EVC) seeks to convert the emotional state of an utterance while preserving the linguistic content and speaker identity.
In this paper, we aim to explicitly characterize and control the intensity of emotion.
We propose to disentangle the speaker style from linguistic content and encode the speaker style into a style embedding in a continuous space that forms the prototype of emotion embedding.
arXiv Detail & Related papers (2022-01-10T02:11:25Z) - Emotion Recognition in Audio and Video Using Deep Neural Networks [9.694548197876868]
With advancement of deep learning technology there has been significant improvement of speech recognition.
Recognizing emotion from speech is important aspect and with deep learning technology emotion recognition has improved in accuracy and latency.
In this work, we attempt to explore different neural networks to improve accuracy of emotion recognition.
arXiv Detail & Related papers (2020-06-15T04:50:18Z) - Emotion Recognition From Gait Analyses: Current Research and Future
Directions [48.93172413752614]
gait conveys information about the walker's emotion.
The mapping between various emotions and gait patterns provides a new source for automated emotion recognition.
gait is remotely observable, more difficult to imitate, and requires less cooperation from the subject.
arXiv Detail & Related papers (2020-03-13T08:22:33Z) - Emotion Recognition System from Speech and Visual Information based on
Convolutional Neural Networks [6.676572642463495]
We propose a system that is able to recognize emotions with a high accuracy rate and in real time.
In order to increase the accuracy of the recognition system, we analyze also the speech data and fuse the information coming from both sources.
arXiv Detail & Related papers (2020-02-29T22:09:46Z) - Continuous Emotion Recognition via Deep Convolutional Autoencoder and
Support Vector Regressor [70.2226417364135]
It is crucial that the machine should be able to recognize the emotional state of the user with high accuracy.
Deep neural networks have been used with great success in recognizing emotions.
We present a new model for continuous emotion recognition based on facial expression recognition.
arXiv Detail & Related papers (2020-01-31T17:47:16Z) - Detecting Emotion Primitives from Speech and their use in discerning
Categorical Emotions [16.886826928295203]
Emotion plays an essential role in human-to-human communication, enabling us to convey feelings such as happiness, frustration, and sincerity.
This work investigated how emotion primitives can be used to detect categorical emotions such as happiness, disgust, contempt, anger, and surprise from neutral speech.
Results indicated that arousal, followed by dominance was a better detector of such emotions.
arXiv Detail & Related papers (2020-01-31T03:11:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.