Related papers: E-React: Towards Emotionally Controlled Synthesis of Human Reactions

E-React: Towards Emotionally Controlled Synthesis of Human Reactions

URL: http://arxiv.org/abs/2508.06093v1
Date: Fri, 08 Aug 2025 07:36:32 GMT
Title: E-React: Towards Emotionally Controlled Synthesis of Human Reactions
Authors: Chen Zhu, Buzhen Huang, Zijing Wu, Binghui Zuo, Yangang Wang,
Abstract summary: Existing human motion generation frameworks do not consider the impact of emotions.<n>We introduce a novel task: generating diverse reaction motions in response to different emotional cues.
Score: 27.208537767510617
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Emotion serves as an essential component in daily human interactions. Existing human motion generation frameworks do not consider the impact of emotions, which reduces naturalness and limits their application in interactive tasks, such as human reaction synthesis. In this work, we introduce a novel task: generating diverse reaction motions in response to different emotional cues. However, learning emotion representation from limited motion data and incorporating it into a motion generation framework remains a challenging problem. To address the above obstacles, we introduce a semi-supervised emotion prior in an actor-reactor diffusion model to facilitate emotion-driven reaction synthesis. Specifically, based on the observation that motion clips within a short sequence tend to share the same emotion, we first devise a semi-supervised learning framework to train an emotion prior. With this prior, we further train an actor-reactor diffusion model to generate reactions by considering both spatial interaction and emotional response. Finally, given a motion sequence of an actor, our approach can generate realistic reactions under various emotional conditions. Experimental results demonstrate that our model outperforms existing reaction generation methods. The code and data will be made publicly available at https://ereact.github.io/

Related papers

ReactDiff: Fundamental Multiple Appropriate Facial Reaction Diffusion Model [0.9786690381850356]
We propose ReactDiff, a novel temporal diffusion framework for generating diverse facial reactions.<n>Our key insight is that plausible human reactions demonstrate smoothness, and coherence over time.<n>Our approach achieves state-of-the-art reaction quality and excels in diversity and reaction appropriateness.
arXiv Detail & Related papers (2025-10-06T11:30:40Z)
MoReact: Generating Reactive Motion from Textual Descriptions [57.642436102978245]
MoReact is a diffusion-based method designed to disentangle the generation of global trajectories and local motions sequentially.<n>Our experiments, utilizing data adapted from a two-person motion dataset, demonstrate the efficacy of our approach.
arXiv Detail & Related papers (2025-09-28T14:31:41Z)
EmoCAST: Emotional Talking Portrait via Emotive Text Description [56.42674612728354]
EmoCAST is a diffusion-based framework for precise text-driven emotional synthesis.<n>In appearance modeling, emotional prompts are integrated through a text-guided decoupled emotive module.<n>EmoCAST achieves state-of-the-art performance in generating realistic, emotionally expressive, and audio-synchronized talking-head videos.
arXiv Detail & Related papers (2025-08-28T10:02:06Z)
Taming Transformer for Emotion-Controllable Talking Face Generation [61.835295250047196]
We propose a novel method to tackle the emotion-controllable talking face generation task discretely.<n>Specifically, we employ two pre-training strategies to disentangle audio into independent components and quantize videos into combinations of visual tokens.<n>We conduct experiments on the MEAD dataset that controls the emotion of videos conditioned on multiple emotional audios.
arXiv Detail & Related papers (2025-08-20T02:16:52Z)
HERO: Human Reaction Generation from Videos [54.602947113980655]
HERO is a framework for Human rEaction geneRation from videOs.<n> HERO considers both global and frame-level local representations of the video to extract the interaction intention.<n>Local visual representations are continuously injected into the model to maximize the exploitation of the dynamic properties inherent in videos.
arXiv Detail & Related papers (2025-03-11T10:39:32Z)
ReGenNet: Towards Human Action-Reaction Synthesis [87.57721371471536]
We analyze the asymmetric, dynamic, synchronous, and detailed nature of human-human interactions. We propose the first multi-setting human action-reaction benchmark to generate human reactions conditioned on given human actions.
arXiv Detail & Related papers (2024-03-18T15:33:06Z)
ReMoS: 3D Motion-Conditioned Reaction Synthesis for Two-Person Interactions [66.87211993793807]
We present ReMoS, a denoising diffusion based model that synthesizes full body motion of a person in two person interaction scenario. We demonstrate ReMoS across challenging two person scenarios such as pair dancing, Ninjutsu, kickboxing, and acrobatics. We also contribute the ReMoCap dataset for two person interactions containing full body and finger motions.
arXiv Detail & Related papers (2023-11-28T18:59:52Z)
Empathetic Response Generation via Emotion Cause Transition Graph [29.418144401849194]
Empathetic dialogue is a human-like behavior that requires the perception of both affective factors (e.g., emotion status) and cognitive factors (e.g., cause of the emotion) We propose an emotion cause transition graph to explicitly model the natural transition of emotion causes between two adjacent turns in empathetic dialogue. With this graph, the concept words of the emotion causes in the next turn can be predicted and used by a specifically designed concept-aware decoder to generate the empathic response.
arXiv Detail & Related papers (2023-02-23T05:51:17Z)
Speech Synthesis with Mixed Emotions [77.05097999561298]
We propose a novel formulation that measures the relative difference between the speech samples of different emotions. We then incorporate our formulation into a sequence-to-sequence emotional text-to-speech framework. At run-time, we control the model to produce the desired emotion mixture by manually defining an emotion attribute vector.
arXiv Detail & Related papers (2022-08-11T15:45:58Z)
EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model [32.19539143308341]
We propose the Emotion-Aware Motion Model (EAMM) to generate one-shot emotional talking faces. By incorporating the results from both modules, our method can generate satisfactory talking face results on arbitrary subjects.
arXiv Detail & Related papers (2022-05-30T17:39:45Z)
Emotion Recognition under Consideration of the Emotion Component Process Model [9.595357496779394]
We use the emotion component process model (CPM) by Scherer (2005) to explain emotion communication. CPM states that emotions are a coordinated process of various subcomponents, in reaction to an event, namely the subjective feeling, the cognitive appraisal, the expression, a physiological bodily reaction, and a motivational action tendency. We find that emotions on Twitter are predominantly expressed by event descriptions or subjective reports of the feeling, while in literature, authors prefer to describe what characters do, and leave the interpretation to the reader.
arXiv Detail & Related papers (2021-07-27T15:53:25Z)
Emotion Eliciting Machine: Emotion Eliciting Conversation Generation based on Dual Generator [18.711852474600143]
We study the problem of positive emotion elicitation, which aims to generate responses that can elicit positive emotion of the user. We propose a weakly supervised Emotion Eliciting Machine (EEM) to address this problem. EEM outperforms the existing models in generating responses with positive emotion elicitation.
arXiv Detail & Related papers (2021-05-18T03:19:25Z)
Modality-Transferable Emotion Embeddings for Low-Resource Multimodal Emotion Recognition [55.44502358463217]
We propose a modality-transferable model with emotion embeddings to tackle the aforementioned issues. Our model achieves state-of-the-art performance on most of the emotion categories. Our model also outperforms existing baselines in the zero-shot and few-shot scenarios for unseen emotions.
arXiv Detail & Related papers (2020-09-21T06:10:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.