Language Models as Emotional Classifiers for Textual Conversations
- URL: http://arxiv.org/abs/2008.12360v1
- Date: Thu, 27 Aug 2020 20:04:30 GMT
- Title: Language Models as Emotional Classifiers for Textual Conversations
- Authors: Connor T. Heaton, David M. Schwartz
- Abstract summary: We present a novel methodology for classifying emotion in a conversation.
At the backbone of our proposed methodology is a pre-trained Language Model (LM)
We apply our proposed methodology on the IEMOCAP and Friends data sets.
- Score: 3.04585143845864
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Emotions play a critical role in our everyday lives by altering how we
perceive, process and respond to our environment. Affective computing aims to
instill in computers the ability to detect and act on the emotions of human
actors. A core aspect of any affective computing system is the classification
of a user's emotion. In this study we present a novel methodology for
classifying emotion in a conversation. At the backbone of our proposed
methodology is a pre-trained Language Model (LM), which is supplemented by a
Graph Convolutional Network (GCN) that propagates information over the
predicate-argument structure identified in an utterance. We apply our proposed
methodology on the IEMOCAP and Friends data sets, achieving state-of-the-art
performance on the former and a higher accuracy on certain emotional labels on
the latter. Furthermore, we examine the role context plays in our methodology
by altering how much of the preceding conversation the model has access to when
making a classification.
Related papers
- Recognizing Emotion Regulation Strategies from Human Behavior with Large Language Models [44.015651538470856]
Human emotions are often not expressed directly, but regulated according to internal processes and social display rules.
No method to automatically classify different emotion regulation strategies in a cross-user scenario exists.
We make use of the recently introduced textscDeep corpus for modeling the social display of the emotion shame.
A fine-tuned Llama2-7B model is able to classify the utilized emotion regulation strategy with high accuracy.
arXiv Detail & Related papers (2024-08-08T12:47:10Z) - Emotion Rendering for Conversational Speech Synthesis with Heterogeneous
Graph-Based Context Modeling [50.99252242917458]
Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting.
To address the issue of data scarcity, we meticulously create emotional labels in terms of category and intensity.
Our model outperforms the baseline models in understanding and rendering emotions.
arXiv Detail & Related papers (2023-12-19T08:47:50Z) - Dynamic Causal Disentanglement Model for Dialogue Emotion Detection [77.96255121683011]
We propose a Dynamic Causal Disentanglement Model based on hidden variable separation.
This model effectively decomposes the content of dialogues and investigates the temporal accumulation of emotions.
Specifically, we propose a dynamic temporal disentanglement model to infer the propagation of utterances and hidden variables.
arXiv Detail & Related papers (2023-09-13T12:58:09Z) - deep learning of segment-level feature representation for speech emotion
recognition in conversations [9.432208348863336]
We propose a conversational speech emotion recognition method to deal with capturing attentive contextual dependency and speaker-sensitive interactions.
First, we use a pretrained VGGish model to extract segment-based audio representation in individual utterances.
Second, an attentive bi-directional recurrent unit (GRU) models contextual-sensitive information and explores intra- and inter-speaker dependencies jointly.
arXiv Detail & Related papers (2023-02-05T16:15:46Z) - Multimodal Emotion Recognition using Transfer Learning from Speaker
Recognition and BERT-based models [53.31917090073727]
We propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities.
We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture dataset.
arXiv Detail & Related papers (2022-02-16T00:23:42Z) - Multimodal Emotion Recognition with High-level Speech and Text Features [8.141157362639182]
We propose a novel cross-representation speech model to perform emotion recognition on wav2vec 2.0 speech features.
We also train a CNN-based model to recognize emotions from text features extracted with Transformer-based models.
Our method is evaluated on the IEMOCAP dataset in a 4-class classification problem.
arXiv Detail & Related papers (2021-09-29T07:08:40Z) - An Attribute-Aligned Strategy for Learning Speech Representation [57.891727280493015]
We propose an attribute-aligned learning strategy to derive speech representation that can flexibly address these issues by attribute-selection mechanism.
Specifically, we propose a layered-representation variational autoencoder (LR-VAE), which factorizes speech representation into attribute-sensitive nodes.
Our proposed method achieves competitive performances on identity-free SER and a better performance on emotionless SV.
arXiv Detail & Related papers (2021-06-05T06:19:14Z) - Enhancing Cognitive Models of Emotions with Representation Learning [58.2386408470585]
We present a novel deep learning-based framework to generate embedding representations of fine-grained emotions.
Our framework integrates a contextualized embedding encoder with a multi-head probing model.
Our model is evaluated on the Empathetic Dialogue dataset and shows the state-of-the-art result for classifying 32 emotions.
arXiv Detail & Related papers (2021-04-20T16:55:15Z) - Acted vs. Improvised: Domain Adaptation for Elicitation Approaches in
Audio-Visual Emotion Recognition [29.916609743097215]
Key challenges in developing generalized automatic emotion recognition systems include scarcity of labeled data and lack of gold-standard references.
In this work, we regard the emotion elicitation approach as domain knowledge, and explore domain transfer learning techniques on emotional utterances.
arXiv Detail & Related papers (2021-04-05T15:59:31Z) - Seen and Unseen emotional style transfer for voice conversion with a new
emotional speech dataset [84.53659233967225]
Emotional voice conversion aims to transform emotional prosody in speech while preserving the linguistic content and speaker identity.
We propose a novel framework based on variational auto-encoding Wasserstein generative adversarial network (VAW-GAN)
We show that the proposed framework achieves remarkable performance by consistently outperforming the baseline framework.
arXiv Detail & Related papers (2020-10-28T07:16:18Z) - Temporal aggregation of audio-visual modalities for emotion recognition [0.5352699766206808]
We propose a multimodal fusion technique for emotion recognition based on combining audio-visual modalities from a temporal window with different temporal offsets for each modality.
Our proposed method outperforms other methods from the literature and human accuracy rating.
arXiv Detail & Related papers (2020-07-08T18:44:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.