ReactDiff: Fundamental Multiple Appropriate Facial Reaction Diffusion Model
- URL: http://arxiv.org/abs/2510.04712v1
- Date: Mon, 06 Oct 2025 11:30:40 GMT
- Title: ReactDiff: Fundamental Multiple Appropriate Facial Reaction Diffusion Model
- Authors: Luo Cheng, Song Siyang, Yan Siyuan, Yu Zhen, Ge Zongyuan,
- Abstract summary: We propose ReactDiff, a novel temporal diffusion framework for generating diverse facial reactions.<n>Our key insight is that plausible human reactions demonstrate smoothness, and coherence over time.<n>Our approach achieves state-of-the-art reaction quality and excels in diversity and reaction appropriateness.
- Score: 0.9786690381850356
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The automatic generation of diverse and human-like facial reactions in dyadic dialogue remains a critical challenge for human-computer interaction systems. Existing methods fail to model the stochasticity and dynamics inherent in real human reactions. To address this, we propose ReactDiff, a novel temporal diffusion framework for generating diverse facial reactions that are appropriate for responding to any given dialogue context. Our key insight is that plausible human reactions demonstrate smoothness, and coherence over time, and conform to constraints imposed by human facial anatomy. To achieve this, ReactDiff incorporates two vital priors (spatio-temporal facial kinematics) into the diffusion process: i) temporal facial behavioral kinematics and ii) facial action unit dependencies. These two constraints guide the model toward realistic human reaction manifolds, avoiding visually unrealistic jitters, unstable transitions, unnatural expressions, and other artifacts. Extensive experiments on the REACT2024 dataset demonstrate that our approach not only achieves state-of-the-art reaction quality but also excels in diversity and reaction appropriateness.
Related papers
- Interact2Ar: Full-Body Human-Human Interaction Generation via Autoregressive Diffusion Models [80.28579390566298]
We introduce Interact2Ar, a text-conditioned autoregressive diffusion model for generating full-body, human-human interactions.<n>Hand kinematics are incorporated through dedicated parallel branches, enabling high-fidelity full-body generation.<n>Our model enables a series of downstream applications, including temporal motion composition, real-time adaptation to disturbances, and extension beyond dyadic to multi-person scenarios.
arXiv Detail & Related papers (2025-12-22T18:59:50Z) - MoReact: Generating Reactive Motion from Textual Descriptions [57.642436102978245]
MoReact is a diffusion-based method designed to disentangle the generation of global trajectories and local motions sequentially.<n>Our experiments, utilizing data adapted from a two-person motion dataset, demonstrate the efficacy of our approach.
arXiv Detail & Related papers (2025-09-28T14:31:41Z) - Latent Behavior Diffusion for Sequential Reaction Generation in Dyadic Setting [11.016004057765185]
The dyadic reaction generation task involves responsive facial reactions that align closely with the behaviors of a conversational partner.<n>This paper introduces a novel approach, the Latent Behavior Diffusion Model, comprising a context-aware autoencoder and a diffusion-based conditional generator.<n> Experimental results demonstrate the effectiveness of our approach in achieving superior performance in dyadic reaction synthesis tasks compared to existing methods.
arXiv Detail & Related papers (2025-05-12T09:22:27Z) - Ready-to-React: Online Reaction Policy for Two-Character Interaction Generation [82.73098356401725]
We propose an online reaction policy, called Ready-to-React, to generate the next character pose based on past observed motions.<n>Each character has its own reaction policy as its "brain", enabling them to interact like real humans in a streaming manner.<n>Our approach can be controlled by sparse signals, making it well-suited for VR and other online interactive environments.
arXiv Detail & Related papers (2025-02-27T18:40:30Z) - PhysReaction: Physically Plausible Real-Time Humanoid Reaction Synthesis via Forward Dynamics Guided 4D Imitation [19.507619255773125]
We propose a Forward Dynamics Guided 4D Imitation method to generate physically plausible human-like reactions.
The learned policy is capable of generating physically plausible and human-like reactions in real-time, significantly improving the speed(x33) and quality of reactions.
arXiv Detail & Related papers (2024-04-01T12:21:56Z) - ReGenNet: Towards Human Action-Reaction Synthesis [87.57721371471536]
We analyze the asymmetric, dynamic, synchronous, and detailed nature of human-human interactions.
We propose the first multi-setting human action-reaction benchmark to generate human reactions conditioned on given human actions.
arXiv Detail & Related papers (2024-03-18T15:33:06Z) - ReMoS: 3D Motion-Conditioned Reaction Synthesis for Two-Person Interactions [66.87211993793807]
We present ReMoS, a denoising diffusion based model that synthesizes full body motion of a person in two person interaction scenario.
We demonstrate ReMoS across challenging two person scenarios such as pair dancing, Ninjutsu, kickboxing, and acrobatics.
We also contribute the ReMoCap dataset for two person interactions containing full body and finger motions.
arXiv Detail & Related papers (2023-11-28T18:59:52Z) - ReactFace: Online Multiple Appropriate Facial Reaction Generation in Dyadic Interactions [46.66378299720377]
In dyadic interaction, predicting the listener's facial reactions is challenging as different reactions could be appropriate in response to the same speaker's behaviour.
This paper reformulates the task as an extrapolation or prediction problem, and proposes a novel framework (called ReactFace) to generate multiple different but appropriate facial reactions.
arXiv Detail & Related papers (2023-05-25T05:55:53Z) - Multiple Appropriate Facial Reaction Generation in Dyadic Interaction
Settings: What, Why and How? [11.130984858239412]
This paper defines the Multiple Appropriate Reaction Generation task for the first time in the literature.
It then proposes a new set of objective evaluation metrics to evaluate the appropriateness of the generated reactions.
The paper subsequently introduces a framework to predict, generate, and evaluate multiple appropriate facial reactions.
arXiv Detail & Related papers (2023-02-13T16:49:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.