Generative Adversarial Networks in Human Emotion Synthesis:A Review
- URL: http://arxiv.org/abs/2010.15075v2
- Date: Sat, 7 Nov 2020 11:05:36 GMT
- Title: Generative Adversarial Networks in Human Emotion Synthesis:A Review
- Authors: Noushin Hajarolasvadi, Miguel Arjona Ram\'irez and Hasan Demirel
- Abstract summary: Deep generative models have become an emerging topic in various research areas like computer vision and signal processing.
Affective computing observed a rapid derivation of generative models during the last two decades.
Facial expression synthesis, speech emotion synthesis, and the audio-visual (cross-modal) emotion synthesis is reviewed extensively.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Synthesizing realistic data samples is of great value for both academic and
industrial communities. Deep generative models have become an emerging topic in
various research areas like computer vision and signal processing. Affective
computing, a topic of a broad interest in computer vision society, has been no
exception and has benefited from generative models. In fact, affective
computing observed a rapid derivation of generative models during the last two
decades. Applications of such models include but are not limited to emotion
recognition and classification, unimodal emotion synthesis, and cross-modal
emotion synthesis. As a result, we conducted a review of recent advances in
human emotion synthesis by studying available databases, advantages, and
disadvantages of the generative models along with the related training
strategies considering two principal human communication modalities, namely
audio and video. In this context, facial expression synthesis, speech emotion
synthesis, and the audio-visual (cross-modal) emotion synthesis is reviewed
extensively under different application scenarios. Gradually, we discuss open
research problems to push the boundaries of this research area for future
works.
Related papers
- A Review of Human Emotion Synthesis Based on Generative Technology [14.92674135999986]
Human emotion synthesis is a crucial aspect of affective computing.
It involves using computational methods to mimic and convey human emotions through various modalities.
Recent advancements in generative models have significantly contributed to the development of this field.
arXiv Detail & Related papers (2024-12-10T02:06:10Z) - Generative Technology for Human Emotion Recognition: A Scope Review [11.578408396744237]
This survey aims to bridge the gaps in the existing literature by conducting a comprehensive analysis of over 320 research papers until June 2024.
It will introduce the mathematical principles of different generative models and the commonly used datasets.
It will provide an in-depth analysis of how generative techniques address emotion recognition based on different modalities.
arXiv Detail & Related papers (2024-07-04T05:22:55Z) - Emotion Rendering for Conversational Speech Synthesis with Heterogeneous
Graph-Based Context Modeling [50.99252242917458]
Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting.
To address the issue of data scarcity, we meticulously create emotional labels in terms of category and intensity.
Our model outperforms the baseline models in understanding and rendering emotions.
arXiv Detail & Related papers (2023-12-19T08:47:50Z) - Do Stochastic Parrots have Feelings Too? Improving Neural Detection of
Synthetic Text via Emotion Recognition [16.31088877974614]
generative AI has shone a spotlight on high-performance synthetic text generation technologies.
Recent developments in generative AI have shone a spotlight on high-performance synthetic text generation technologies.
We draw inspiration from psychological studies which suggest that people can be driven by emotion and encode emotion in the text they compose.
arXiv Detail & Related papers (2023-10-24T15:07:35Z) - ORES: Open-vocabulary Responsible Visual Synthesis [104.7572323359984]
We formalize a new task, Open-vocabulary Responsible Visual Synthesis (ORES), where the synthesis model is able to avoid forbidden visual concepts.
To address this problem, we present a Two-stage Intervention (TIN) framework.
By introducing 1) rewriting with learnable instruction through a large-scale language model (LLM) and 2) synthesizing with prompt intervention on a diffusion model, it can effectively synthesize images avoiding any concepts but following the user's query as much as possible.
arXiv Detail & Related papers (2023-08-26T06:47:34Z) - An Overview of Affective Speech Synthesis and Conversion in the Deep
Learning Era [39.91844543424965]
Affect, or expressivity, has the capacity to turn speech into a medium capable of conveying intimate thoughts, feelings, and emotions.
Following recent advances in text-to-speech synthesis, a paradigm shift is well under way in the fields of affective speech synthesis and conversion.
Deep learning, the technology which underlies most of the recent advances in artificial intelligence, is spearheading these efforts.
arXiv Detail & Related papers (2022-10-06T13:55:59Z) - Data-driven emotional body language generation for social robotics [58.88028813371423]
In social robotics, endowing humanoid robots with the ability to generate bodily expressions of affect can improve human-robot interaction and collaboration.
We implement a deep learning data-driven framework that learns from a few hand-designed robotic bodily expressions.
The evaluation study found that the anthropomorphism and animacy of the generated expressions are not perceived differently from the hand-designed ones.
arXiv Detail & Related papers (2022-05-02T09:21:39Z) - LaughNet: synthesizing laughter utterances from waveform silhouettes and
a single laughter example [55.10864476206503]
We propose a model called LaughNet for synthesizing laughter by using waveform silhouettes as inputs.
The results show that LaughNet can synthesize laughter utterances with moderate quality and retain the characteristics of the training example.
arXiv Detail & Related papers (2021-10-11T00:45:07Z) - EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional
Text-to-Speech Model [56.75775793011719]
We introduce and publicly release a Mandarin emotion speech dataset including 9,724 samples with audio files and its emotion human-labeled annotation.
Unlike those models which need additional reference audio as input, our model could predict emotion labels just from the input text and generate more expressive speech conditioned on the emotion embedding.
In the experiment phase, we first validate the effectiveness of our dataset by an emotion classification task. Then we train our model on the proposed dataset and conduct a series of subjective evaluations.
arXiv Detail & Related papers (2021-06-17T08:34:21Z) - Enhancing Cognitive Models of Emotions with Representation Learning [58.2386408470585]
We present a novel deep learning-based framework to generate embedding representations of fine-grained emotions.
Our framework integrates a contextualized embedding encoder with a multi-head probing model.
Our model is evaluated on the Empathetic Dialogue dataset and shows the state-of-the-art result for classifying 32 emotions.
arXiv Detail & Related papers (2021-04-20T16:55:15Z) - Modeling emotion for human-like behavior in future intelligent robots [0.913755431537592]
We show how neuroscience can help advance the current state of the art.
We argue that a stronger integration of emotion-related processes in robot models is critical for the design of human-like behavior.
arXiv Detail & Related papers (2020-09-30T17:32:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.