Learning Emotion Representations from Verbal and Nonverbal Communication
- URL: http://arxiv.org/abs/2305.13500v1
- Date: Mon, 22 May 2023 21:36:55 GMT
- Title: Learning Emotion Representations from Verbal and Nonverbal Communication
- Authors: Sitao Zhang, Yimu Pan, James Z. Wang
- Abstract summary: We present EmotionCLIP, the first pre-training paradigm to extract visual emotion representations from verbal and nonverbal communication.
We guide EmotionCLIP to attend to nonverbal emotion cues through subject-aware context encoding and verbal emotion cues using sentiment-guided contrastive learning.
EmotionCLIP will address the prevailing issue of data scarcity in emotion understanding, thereby fostering progress in related domains.
- Score: 7.747924294389427
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Emotion understanding is an essential but highly challenging component of
artificial general intelligence. The absence of extensively annotated datasets
has significantly impeded advancements in this field. We present EmotionCLIP,
the first pre-training paradigm to extract visual emotion representations from
verbal and nonverbal communication using only uncurated data. Compared to
numerical labels or descriptions used in previous methods, communication
naturally contains emotion information. Furthermore, acquiring emotion
representations from communication is more congruent with the human learning
process. We guide EmotionCLIP to attend to nonverbal emotion cues through
subject-aware context encoding and verbal emotion cues using sentiment-guided
contrastive learning. Extensive experiments validate the effectiveness and
transferability of EmotionCLIP. Using merely linear-probe evaluation protocol,
EmotionCLIP outperforms the state-of-the-art supervised visual emotion
recognition methods and rivals many multimodal approaches across various
benchmarks. We anticipate that the advent of EmotionCLIP will address the
prevailing issue of data scarcity in emotion understanding, thereby fostering
progress in related domains. The code and pre-trained models are available at
https://github.com/Xeaver/EmotionCLIP.
Related papers
- Attention-based Interactive Disentangling Network for Instance-level
Emotional Voice Conversion [81.1492897350032]
Emotional Voice Conversion aims to manipulate a speech according to a given emotion while preserving non-emotion components.
We propose an Attention-based Interactive diseNtangling Network (AINN) that leverages instance-wise emotional knowledge for voice conversion.
arXiv Detail & Related papers (2023-12-29T08:06:45Z) - emotion2vec: Self-Supervised Pre-Training for Speech Emotion
Representation [42.29118614670941]
We propose emotion2vec, a universal speech emotion representation model.
emotion2vec is pre-trained on unlabeled emotion data through self-supervised online distillation.
It outperforms state-of-the-art pre-trained universal models and emotion specialist models.
arXiv Detail & Related papers (2023-12-23T07:46:55Z) - Emotion Rendering for Conversational Speech Synthesis with Heterogeneous
Graph-Based Context Modeling [50.99252242917458]
Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting.
To address the issue of data scarcity, we meticulously create emotional labels in terms of category and intensity.
Our model outperforms the baseline models in understanding and rendering emotions.
arXiv Detail & Related papers (2023-12-19T08:47:50Z) - Emotion Intensity and its Control for Emotional Voice Conversion [77.05097999561298]
Emotional voice conversion (EVC) seeks to convert the emotional state of an utterance while preserving the linguistic content and speaker identity.
In this paper, we aim to explicitly characterize and control the intensity of emotion.
We propose to disentangle the speaker style from linguistic content and encode the speaker style into a style embedding in a continuous space that forms the prototype of emotion embedding.
arXiv Detail & Related papers (2022-01-10T02:11:25Z) - Multi-Cue Adaptive Emotion Recognition Network [4.570705738465714]
We propose a new deep learning approach for emotion recognition based on adaptive multi-cues.
We compare the proposed approach with the state-of-art approaches in the CAER-S dataset.
arXiv Detail & Related papers (2021-11-03T15:08:55Z) - Emotion Recognition from Multiple Modalities: Fundamentals and
Methodologies [106.62835060095532]
We discuss several key aspects of multi-modal emotion recognition (MER)
We begin with a brief introduction on widely used emotion representation models and affective modalities.
We then summarize existing emotion annotation strategies and corresponding computational tasks.
Finally, we outline several real-world applications and discuss some future directions.
arXiv Detail & Related papers (2021-08-18T21:55:20Z) - Using Knowledge-Embedded Attention to Augment Pre-trained Language
Models for Fine-Grained Emotion Recognition [0.0]
We focus on improving fine-grained emotion recognition by introducing external knowledge into a pre-trained self-attention model.
Our results and error analyses outperform previous models on several datasets.
arXiv Detail & Related papers (2021-07-31T09:41:44Z) - A Circular-Structured Representation for Visual Emotion Distribution
Learning [82.89776298753661]
We propose a well-grounded circular-structured representation to utilize the prior knowledge for visual emotion distribution learning.
To be specific, we first construct an Emotion Circle to unify any emotional state within it.
On the proposed Emotion Circle, each emotion distribution is represented with an emotion vector, which is defined with three attributes.
arXiv Detail & Related papers (2021-06-23T14:53:27Z) - Emotion-aware Chat Machine: Automatic Emotional Response Generation for
Human-like Emotional Interaction [55.47134146639492]
This article proposes a unifed end-to-end neural architecture, which is capable of simultaneously encoding the semantics and the emotions in a post.
Experiments on real-world data demonstrate that the proposed method outperforms the state-of-the-art methods in terms of both content coherence and emotion appropriateness.
arXiv Detail & Related papers (2021-06-06T06:26:15Z) - Meta Transfer Learning for Emotion Recognition [42.61707533351803]
We propose a PathNet-based transfer learning method that is able to transfer emotional knowledge learned from one visual/audio emotion domain to another visual/audio emotion domain.
Our proposed system is capable of improving the performance of emotion recognition, making its performance substantially superior to the recent proposed fine-tuning/pre-trained models based transfer learning methods.
arXiv Detail & Related papers (2020-06-23T00:25:28Z) - Detecting Emotion Primitives from Speech and their use in discerning
Categorical Emotions [16.886826928295203]
Emotion plays an essential role in human-to-human communication, enabling us to convey feelings such as happiness, frustration, and sincerity.
This work investigated how emotion primitives can be used to detect categorical emotions such as happiness, disgust, contempt, anger, and surprise from neutral speech.
Results indicated that arousal, followed by dominance was a better detector of such emotions.
arXiv Detail & Related papers (2020-01-31T03:11:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.