Emotion Embeddings $\unicode{x2014}$ Learning Stable and Homogeneous
Abstractions from Heterogeneous Affective Datasets
- URL: http://arxiv.org/abs/2308.07871v1
- Date: Tue, 15 Aug 2023 16:39:10 GMT
- Title: Emotion Embeddings $\unicode{x2014}$ Learning Stable and Homogeneous
Abstractions from Heterogeneous Affective Datasets
- Authors: Sven Buechel and Udo Hahn
- Abstract summary: We propose a training procedure that learns a shared latent representation for emotions.
Experiments on a wide range of heterogeneous affective datasets indicate that this approach yields the desired interoperability.
- Score: 4.720033725720261
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human emotion is expressed in many communication modalities and media formats
and so their computational study is equally diversified into natural language
processing, audio signal analysis, computer vision, etc. Similarly, the large
variety of representation formats used in previous research to describe
emotions (polarity scales, basic emotion categories, dimensional approaches,
appraisal theory, etc.) have led to an ever proliferating diversity of
datasets, predictive models, and software tools for emotion analysis. Because
of these two distinct types of heterogeneity, at the expressional and
representational level, there is a dire need to unify previous work on
increasingly diverging data and label types. This article presents such a
unifying computational model. We propose a training procedure that learns a
shared latent representation for emotions, so-called emotion embeddings,
independent of different natural languages, communication modalities, media or
representation label formats, and even disparate model architectures.
Experiments on a wide range of heterogeneous affective datasets indicate that
this approach yields the desired interoperability for the sake of reusability,
interpretability and flexibility, without penalizing prediction quality. Code
and data are archived under https://doi.org/10.5281/zenodo.7405327 .
Related papers
- Emotion Rendering for Conversational Speech Synthesis with Heterogeneous
Graph-Based Context Modeling [50.99252242917458]
Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting.
To address the issue of data scarcity, we meticulously create emotional labels in terms of category and intensity.
Our model outperforms the baseline models in understanding and rendering emotions.
arXiv Detail & Related papers (2023-12-19T08:47:50Z) - Towards Generalizable SER: Soft Labeling and Data Augmentation for
Modeling Temporal Emotion Shifts in Large-Scale Multilingual Speech [3.86122440373248]
We propose a soft labeling system to capture gradational emotional intensities.
Using the Whisper encoder and data augmentation methods inspired by contrastive learning, our method emphasizes the temporal dynamics of emotions.
We publish our open source model weights and initial promising results after fine-tuning on Hume-Prosody.
arXiv Detail & Related papers (2023-11-15T00:09:21Z) - Context Unlocks Emotions: Text-based Emotion Classification Dataset
Auditing with Large Language Models [23.670143829183104]
The lack of contextual information in text data can make the annotation process of text-based emotion classification datasets challenging.
We propose a formal definition of textual context to motivate a prompting strategy to enhance such contextual information.
Our method improves alignment between inputs and their human-annotated labels from both an empirical and human-evaluated standpoint.
arXiv Detail & Related papers (2023-11-06T21:34:49Z) - Implicit Design Choices and Their Impact on Emotion Recognition Model
Development and Evaluation [5.534160116442057]
The subjectivity of emotions poses significant challenges in developing accurate and robust computational models.
This thesis examines critical facets of emotion recognition, beginning with the collection of diverse datasets.
To handle the challenge of non-representative training data, this work collects the Multimodal Stressed Emotion dataset.
arXiv Detail & Related papers (2023-09-06T02:45:42Z) - Improving the Generalizability of Text-Based Emotion Detection by
Leveraging Transformers with Psycholinguistic Features [27.799032561722893]
We propose approaches for text-based emotion detection that leverage transformer models (BERT and RoBERTa) in combination with Bidirectional Long Short-Term Memory (BiLSTM) networks trained on a comprehensive set of psycholinguistic features.
We find that the proposed hybrid models improve the ability to generalize to out-of-distribution data compared to a standard transformer-based approach.
arXiv Detail & Related papers (2022-12-19T13:58:48Z) - Vision+X: A Survey on Multimodal Learning in the Light of Data [64.03266872103835]
multimodal machine learning that incorporates data from various sources has become an increasingly popular research area.
We analyze the commonness and uniqueness of each data format mainly ranging from vision, audio, text, and motions.
We investigate the existing literature on multimodal learning from both the representation learning and downstream application levels.
arXiv Detail & Related papers (2022-10-05T13:14:57Z) - Seeking Subjectivity in Visual Emotion Distribution Learning [93.96205258496697]
Visual Emotion Analysis (VEA) aims to predict people's emotions towards different visual stimuli.
Existing methods often predict visual emotion distribution in a unified network, neglecting the inherent subjectivity in its crowd voting process.
We propose a novel textitSubjectivity Appraise-and-Match Network (SAMNet) to investigate the subjectivity in visual emotion distribution.
arXiv Detail & Related papers (2022-07-25T02:20:03Z) - Multimodal Emotion Recognition using Transfer Learning from Speaker
Recognition and BERT-based models [53.31917090073727]
We propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities.
We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture dataset.
arXiv Detail & Related papers (2022-02-16T00:23:42Z) - Enhancing Cognitive Models of Emotions with Representation Learning [58.2386408470585]
We present a novel deep learning-based framework to generate embedding representations of fine-grained emotions.
Our framework integrates a contextualized embedding encoder with a multi-head probing model.
Our model is evaluated on the Empathetic Dialogue dataset and shows the state-of-the-art result for classifying 32 emotions.
arXiv Detail & Related papers (2021-04-20T16:55:15Z) - Towards a Unified Framework for Emotion Analysis [12.369106010767283]
EmoCoder is a modular encoder-decoder architecture that generalizes emotion analysis over different tasks.
EmoCoder learns an interpretable language-independent representation of emotions.
arXiv Detail & Related papers (2020-12-01T00:54:13Z) - Modality-Transferable Emotion Embeddings for Low-Resource Multimodal
Emotion Recognition [55.44502358463217]
We propose a modality-transferable model with emotion embeddings to tackle the aforementioned issues.
Our model achieves state-of-the-art performance on most of the emotion categories.
Our model also outperforms existing baselines in the zero-shot and few-shot scenarios for unseen emotions.
arXiv Detail & Related papers (2020-09-21T06:10:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.