Related papers: Latent Behavior Diffusion for Sequential Reaction Generation in Dyadic Setting

Latent Behavior Diffusion for Sequential Reaction Generation in Dyadic Setting

URL: http://arxiv.org/abs/2505.07901v1
Date: Mon, 12 May 2025 09:22:27 GMT
Title: Latent Behavior Diffusion for Sequential Reaction Generation in Dyadic Setting
Authors: Minh-Duc Nguyen, Hyung-Jeong Yang, Soo-Hyung Kim, Ji-Eun Shin, Seung-Won Kim,
Abstract summary: The dyadic reaction generation task involves responsive facial reactions that align closely with the behaviors of a conversational partner.<n>This paper introduces a novel approach, the Latent Behavior Diffusion Model, comprising a context-aware autoencoder and a diffusion-based conditional generator.<n> Experimental results demonstrate the effectiveness of our approach in achieving superior performance in dyadic reaction synthesis tasks compared to existing methods.
Score: 11.016004057765185
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The dyadic reaction generation task involves synthesizing responsive facial reactions that align closely with the behaviors of a conversational partner, enhancing the naturalness and effectiveness of human-like interaction simulations. This paper introduces a novel approach, the Latent Behavior Diffusion Model, comprising a context-aware autoencoder and a diffusion-based conditional generator that addresses the challenge of generating diverse and contextually relevant facial reactions from input speaker behaviors. The autoencoder compresses high-dimensional input features, capturing dynamic patterns in listener reactions while condensing complex input data into a concise latent representation, facilitating more expressive and contextually appropriate reaction synthesis. The diffusion-based conditional generator operates on the latent space generated by the autoencoder to predict realistic facial reactions in a non-autoregressive manner. This approach allows for generating diverse facial reactions that reflect subtle variations in conversational cues and emotional states. Experimental results demonstrate the effectiveness of our approach in achieving superior performance in dyadic reaction synthesis tasks compared to existing methods.

Related papers

AsynFusion: Towards Asynchronous Latent Consistency Models for Decoupled Whole-Body Audio-Driven Avatars [65.53676584955686]
Whole-body audio-driven avatar pose and expression generation is a critical task for creating lifelike digital humans.<n>We propose AsynFusion, a novel framework that leverages diffusion transformers to achieve cohesive expression and gesture synthesis.<n>AsynFusion achieves state-of-the-art performance in generating real-time, synchronized whole-body animations.
arXiv Detail & Related papers (2025-05-21T03:28:53Z)
ReactDiff: Latent Diffusion for Facial Reaction Generation [15.490774894749277]
Given the audio-visual clip of the speaker, facial reaction generation aims to predict the listener's facial reactions.<n>We propose the Facial Reaction Diffusion (ReactDiff) framework that integrates a Multi-Modality Transformer with conditional diffusion.<n> Experimental results demonstrate that ReactDiff significantly outperforms existing approaches, achieving a facial reaction correlation of 0.26 and diversity score of 0.094.
arXiv Detail & Related papers (2025-05-20T10:01:37Z)
ARFlow: Human Action-Reaction Flow Matching with Physical Guidance [34.33083853308399]
Action-Reaction Flow Matching is a novel framework that establishes direct action-to-reaction mappings.<n>Our approach introduces two key innovations: an x1-prediction method that directly outputs human motions instead of velocity fields, and a training-free, gradient-based physical guidance mechanism that effectively prevents body penetration artifacts during sampling.
arXiv Detail & Related papers (2025-03-21T09:41:24Z)
Ready-to-React: Online Reaction Policy for Two-Character Interaction Generation [82.73098356401725]
We propose an online reaction policy, called Ready-to-React, to generate the next character pose based on past observed motions.<n>Each character has its own reaction policy as its "brain", enabling them to interact like real humans in a streaming manner.<n>Our approach can be controlled by sparse signals, making it well-suited for VR and other online interactive environments.
arXiv Detail & Related papers (2025-02-27T18:40:30Z)
Learning Chemical Reaction Representation with Reactant-Product Alignment [50.28123475356234]
RAlign is a novel chemical reaction representation learning model for various organic reaction-related tasks.<n>By integrating atomic correspondence between reactants and products, our model discerns the molecular transformations that occur during the reaction.<n>We introduce a reaction-center-aware attention mechanism that enables the model to concentrate on key functional groups.
arXiv Detail & Related papers (2024-11-26T17:41:44Z)
ReGenNet: Towards Human Action-Reaction Synthesis [87.57721371471536]
We analyze the asymmetric, dynamic, synchronous, and detailed nature of human-human interactions. We propose the first multi-setting human action-reaction benchmark to generate human reactions conditioned on given human actions.
arXiv Detail & Related papers (2024-03-18T15:33:06Z)
Dyadic Interaction Modeling for Social Behavior Generation [6.626277726145613]
We present an effective framework for creating 3D facial motions in dyadic interactions. The heart of our framework is Dyadic Interaction Modeling (DIM), a pre-training approach. Experiments demonstrate the superiority of our framework in generating listener motions.
arXiv Detail & Related papers (2024-03-14T03:21:33Z)
ReMoS: 3D Motion-Conditioned Reaction Synthesis for Two-Person Interactions [66.87211993793807]
We present ReMoS, a denoising diffusion based model that synthesizes full body motion of a person in two person interaction scenario. We demonstrate ReMoS across challenging two person scenarios such as pair dancing, Ninjutsu, kickboxing, and acrobatics. We also contribute the ReMoCap dataset for two person interactions containing full body and finger motions.
arXiv Detail & Related papers (2023-11-28T18:59:52Z)
Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis [53.511443791260206]
We propose a semi-supervised pre-training method to increase the amount of spontaneous-style speech and spontaneous behavioral labels. In the process of semi-supervised learning, both text and speech information are considered for detecting spontaneous behaviors labels in speech.
arXiv Detail & Related papers (2023-08-31T09:50:33Z)
ReactFace: Online Multiple Appropriate Facial Reaction Generation in Dyadic Interactions [46.66378299720377]
In dyadic interaction, predicting the listener's facial reactions is challenging as different reactions could be appropriate in response to the same speaker's behaviour. This paper reformulates the task as an extrapolation or prediction problem, and proposes a novel framework (called ReactFace) to generate multiple different but appropriate facial reactions.
arXiv Detail & Related papers (2023-05-25T05:55:53Z)
Emotion-aware Chat Machine: Automatic Emotional Response Generation for Human-like Emotional Interaction [55.47134146639492]
This article proposes a unifed end-to-end neural architecture, which is capable of simultaneously encoding the semantics and the emotions in a post. Experiments on real-world data demonstrate that the proposed method outperforms the state-of-the-art methods in terms of both content coherence and emotion appropriateness.
arXiv Detail & Related papers (2021-06-06T06:26:15Z)
Inferring Temporal Compositions of Actions Using Probabilistic Automata [61.09176771931052]
We propose to express temporal compositions of actions as semantic regular expressions and derive an inference framework using probabilistic automata. Our approach is different from existing works that either predict long-range complex activities as unordered sets of atomic actions, or retrieve videos using natural language sentences.
arXiv Detail & Related papers (2020-04-28T00:15:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.