LDEB -- Label Digitization with Emotion Binarization and Machine
Learning for Emotion Recognition in Conversational Dialogues
- URL: http://arxiv.org/abs/2306.02193v1
- Date: Sat, 3 Jun 2023 20:37:46 GMT
- Title: LDEB -- Label Digitization with Emotion Binarization and Machine
Learning for Emotion Recognition in Conversational Dialogues
- Authors: Amitabha Dey, Shan Suthaharan
- Abstract summary: Emotion recognition in conversations (ERC) is vital to the advancements of conversational AI and its applications.
The conversational dialogues present a unique problem where each dialogue depicts nested emotions that entangle the association between the emotional feature descriptors and emotion type (or label)
We proposed a novel approach called Label Digitization with Emotion Binarization (LDEB) that disentangles the twists by utilizing the text normalization and 7-bit digital encoding techniques.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Emotion recognition in conversations (ERC) is vital to the advancements of
conversational AI and its applications. Therefore, the development of an
automated ERC model using the concepts of machine learning (ML) would be
beneficial. However, the conversational dialogues present a unique problem
where each dialogue depicts nested emotions that entangle the association
between the emotional feature descriptors and emotion type (or label). This
entanglement that can be multiplied with the presence of data paucity is an
obstacle for a ML model. To overcome this problem, we proposed a novel approach
called Label Digitization with Emotion Binarization (LDEB) that disentangles
the twists by utilizing the text normalization and 7-bit digital encoding
techniques and constructs a meaningful feature space for a ML model to be
trained. We also utilized the publicly available dataset called the
FETA-DailyDialog dataset for feature learning and developed a hierarchical ERC
model using random forest (RF) and artificial neural network (ANN) classifiers.
Simulations showed that the ANN-based ERC model was able to predict emotion
with the best accuracy and precision scores of about 74% and 76%, respectively.
Simulations also showed that the ANN-model could reach a training accuracy
score of about 98% with 60 epochs. On the other hand, the RF-based ERC model
was able to predict emotions with the best accuracy and precision scores of
about 78% and 75%, respectively.
Related papers
- Emo Pillars: Knowledge Distillation to Support Fine-Grained Context-Aware and Context-Less Emotion Classification [56.974545305472304]
Most datasets for sentiment analysis lack context in which an opinion was expressed, often crucial for emotion understanding, and are mainly limited by a few emotion categories.
We design an LLM-based data synthesis pipeline and leverage a large model, Mistral-7b, for the generation of training examples for more accessible, lightweight BERT-type encoder models.
We show that Emo Pillars models are highly adaptive to new domains when tuned to specific tasks such as GoEmotions, ISEAR, IEMOCAP, and EmoContext, reaching the SOTA performance on the first three.
arXiv Detail & Related papers (2025-04-23T16:23:17Z) - Leveraging Cross-Attention Transformer and Multi-Feature Fusion for Cross-Linguistic Speech Emotion Recognition [60.58049741496505]
Speech Emotion Recognition (SER) plays a crucial role in enhancing human-computer interaction.
We propose a novel approach HuMP-CAT, which combines HuBERT, MFCC, and prosodic characteristics.
We show that, by fine-tuning the source model with a small portion of speech from the target datasets, HuMP-CAT achieves an average accuracy of 78.75%.
arXiv Detail & Related papers (2025-01-06T14:31:25Z) - MEMO-Bench: A Multiple Benchmark for Text-to-Image and Multimodal Large Language Models on Human Emotion Analysis [53.012111671763776]
This study introduces MEMO-Bench, a comprehensive benchmark consisting of 7,145 portraits, each depicting one of six different emotions.
Results demonstrate that existing T2I models are more effective at generating positive emotions than negative ones.
Although MLLMs show a certain degree of effectiveness in distinguishing and recognizing human emotions, they fall short of human-level accuracy.
arXiv Detail & Related papers (2024-11-18T02:09:48Z) - Speech Emotion Recognition under Resource Constraints with Data Distillation [64.36799373890916]
Speech emotion recognition (SER) plays a crucial role in human-computer interaction.
The emergence of edge devices in the Internet of Things presents challenges in constructing intricate deep learning models.
We propose a data distillation framework to facilitate efficient development of SER models in IoT applications.
arXiv Detail & Related papers (2024-06-21T13:10:46Z) - Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning [55.127202990679976]
We introduce the MERR dataset, containing 28,618 coarse-grained and 4,487 fine-grained annotated samples across diverse emotional categories.
This dataset enables models to learn from varied scenarios and generalize to real-world applications.
We propose Emotion-LLaMA, a model that seamlessly integrates audio, visual, and textual inputs through emotion-specific encoders.
arXiv Detail & Related papers (2024-06-17T03:01:22Z) - Real-time EEG-based Emotion Recognition Model using Principal Component
Analysis and Tree-based Models for Neurohumanities [0.0]
This project proposes a solution by incorporating emotional monitoring during the learning process of context inside an immersive space.
A real-time emotion detection EEG-based system was developed to interpret and classify specific emotions.
This system aims to integrate emotional data into the Neurohumanities Lab interactive platform, creating a comprehensive and immersive learning environment.
arXiv Detail & Related papers (2024-01-28T20:02:13Z) - EmoDiarize: Speaker Diarization and Emotion Identification from Speech
Signals using Convolutional Neural Networks [0.0]
This research explores the integration of deep learning techniques in speech emotion recognition.
It introduces a framework that combines a pre-existing speaker diarization pipeline and an emotion identification model built on a Convolutional Neural Network (CNN)
The proposed model yields an unweighted accuracy of 63%, demonstrating remarkable efficiency in accurately identifying emotional states within speech signals.
arXiv Detail & Related papers (2023-10-19T16:02:53Z) - Evaluating raw waveforms with deep learning frameworks for speech
emotion recognition [0.0]
We represent a model, which feeds raw audio files directly into the deep neural networks without any feature extraction stage.
We use six different data sets, EMO-DB, RAVDESS, TESS, CREMA, SAVEE, and TESS+RAVDESS.
The proposed model performs 90.34% of accuracy for EMO-DB with CNN model, 90.42% of accuracy for RAVDESS, 99.48% of accuracy for TESS with LSTM model, 69.72% of accuracy for CREMA with CNN model, 85.76% of accuracy for SAVEE with CNN model in
arXiv Detail & Related papers (2023-07-06T07:27:59Z) - Learning Speech Emotion Representations in the Quaternion Domain [16.596137913051212]
RH-emo is a novel semi-supervised architecture aimed at extracting quaternion embeddings from real-valued monoaural spectrograms.
RH-emo is a hybrid real/quaternion autoencoder network that consists of a real-valued encoder in parallel to a real-valued emotion classifier and a quaternion-valued decoder.
We test our approach on speech emotion recognition tasks using four popular datasets: Iemocap, Ravdess, EmoDb and Tess.
arXiv Detail & Related papers (2022-04-05T17:45:09Z) - Multimodal Emotion Recognition using Transfer Learning from Speaker
Recognition and BERT-based models [53.31917090073727]
We propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities.
We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture dataset.
arXiv Detail & Related papers (2022-02-16T00:23:42Z) - Improved Speech Emotion Recognition using Transfer Learning and
Spectrogram Augmentation [56.264157127549446]
Speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction.
One of the main challenges in SER is data scarcity.
We propose a transfer learning strategy combined with spectrogram augmentation.
arXiv Detail & Related papers (2021-08-05T10:39:39Z) - Continuous Emotion Recognition with Spatiotemporal Convolutional Neural
Networks [82.54695985117783]
We investigate the suitability of state-of-the-art deep learning architectures for continuous emotion recognition using long video sequences captured in-the-wild.
We have developed and evaluated convolutional recurrent neural networks combining 2D-CNNs and long short term-memory units, and inflated 3D-CNN models, which are built by inflating the weights of a pre-trained 2D-CNN model during fine-tuning.
arXiv Detail & Related papers (2020-11-18T13:42:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.