Related papers: MoReact: Generating Reactive Motion from Textual Descriptions

MoReact: Generating Reactive Motion from Textual Descriptions

URL: http://arxiv.org/abs/2509.23911v1
Date: Sun, 28 Sep 2025 14:31:41 GMT
Title: MoReact: Generating Reactive Motion from Textual Descriptions
Authors: Xiyan Xu, Sirui Xu, Yu-Xiong Wang, Liang-Yan Gui,
Abstract summary: MoReact is a diffusion-based method designed to disentangle the generation of global trajectories and local motions sequentially.<n>Our experiments, utilizing data adapted from a two-person motion dataset, demonstrate the efficacy of our approach.
Score: 57.642436102978245
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modeling and generating human reactions poses a significant challenge with broad applications for computer vision and human-computer interaction. Existing methods either treat multiple individuals as a single entity, directly generating interactions, or rely solely on one person's motion to generate the other's reaction, failing to integrate the rich semantic information that underpins human interactions. Yet, these methods often fall short in adaptive responsiveness, i.e., the ability to accurately respond to diverse and dynamic interaction scenarios. Recognizing this gap, our work introduces an approach tailored to address the limitations of existing models by focusing on text-driven human reaction generation. Our model specifically generates realistic motion sequences for individuals that responding to the other's actions based on a descriptive text of the interaction scenario. The goal is to produce motion sequences that not only complement the opponent's movements but also semantically fit the described interactions. To achieve this, we present MoReact, a diffusion-based method designed to disentangle the generation of global trajectories and local motions sequentially. This approach stems from the observation that generating global trajectories first is crucial for guiding local motion, ensuring better alignment with given action and text. Furthermore, we introduce a novel interaction loss to enhance the realism of generated close interactions. Our experiments, utilizing data adapted from a two-person motion dataset, demonstrate the efficacy of our approach for this novel task, which is capable of producing realistic, diverse, and controllable reactions that not only closely match the movements of the counterpart but also adhere to the textual guidance. Please find our webpage at https://xiyan-xu.github.io/MoReactWebPage.

Related papers

Interact2Ar: Full-Body Human-Human Interaction Generation via Autoregressive Diffusion Models [80.28579390566298]
We introduce Interact2Ar, a text-conditioned autoregressive diffusion model for generating full-body, human-human interactions.<n>Hand kinematics are incorporated through dedicated parallel branches, enabling high-fidelity full-body generation.<n>Our model enables a series of downstream applications, including temporal motion composition, real-time adaptation to disturbances, and extension beyond dyadic to multi-person scenarios.
arXiv Detail & Related papers (2025-12-22T18:59:50Z)
InteracTalker: Prompt-Based Human-Object Interaction with Co-Speech Gesture Generation [1.7523719472700858]
We introduce InteracTalker, a novel framework that seamlessly integrates prompt-based object-aware interactions with co-speech gesture generation.<n>Our framework utilizes a Generalized Motion Adaptation Module that enables independent training, adapting to the corresponding motion condition.<n>InteracTalker successfully unifies these previously separate tasks, outperforming prior methods in both co-speech gesture generation and object-interaction synthesis.
arXiv Detail & Related papers (2025-12-14T12:29:49Z)
Fine-grained text-driven dual-human motion generation via dynamic hierarchical interaction [31.055662466004254]
We propose a fine-grained dual-human motion generation method, namely FineDual, to model dynamic hierarchical interaction.<n>The first stage, Self-Learning Stage, divides the dual-human overall text into individual texts.<n>The second stage, Adaptive Adjustment Stage, predicts interaction distance by an interaction distance predictor.<n>The last stage, Teacher-Guided Refinement Stage, utilizes overall text features as guidance to refine motion features at the overall level.
arXiv Detail & Related papers (2025-10-09T14:18:53Z)
Ready-to-React: Online Reaction Policy for Two-Character Interaction Generation [82.73098356401725]
We propose an online reaction policy, called Ready-to-React, to generate the next character pose based on past observed motions.<n>Each character has its own reaction policy as its "brain", enabling them to interact like real humans in a streaming manner.<n>Our approach can be controlled by sparse signals, making it well-suited for VR and other online interactive environments.
arXiv Detail & Related papers (2025-02-27T18:40:30Z)
KinMo: Kinematic-aware Human Motion Understanding and Generation [6.962697597686156]
Current human motion synthesis frameworks rely on global action descriptions.<n>A single coarse description, such as run, fails to capture details such as variations in speed, limb positioning, and kinematic dynamics.<n>We introduce KinMo, a unified framework built on a hierarchical describable motion representation.
arXiv Detail & Related papers (2024-11-23T06:50:11Z)
THOR: Text to Human-Object Interaction Diffusion via Relation Intervention [51.02435289160616]
We propose a novel Text-guided Human-Object Interaction diffusion model with Relation Intervention (THOR) In each diffusion step, we initiate text-guided human and object motion and then leverage human-object relations to intervene in object motion. We construct Text-BEHAVE, a Text2HOI dataset that seamlessly integrates textual descriptions with the currently largest publicly available 3D HOI dataset.
arXiv Detail & Related papers (2024-03-17T13:17:25Z)
ReMoS: 3D Motion-Conditioned Reaction Synthesis for Two-Person Interactions [66.87211993793807]
We present ReMoS, a denoising diffusion based model that synthesizes full body motion of a person in two person interaction scenario. We demonstrate ReMoS across challenging two person scenarios such as pair dancing, Ninjutsu, kickboxing, and acrobatics. We also contribute the ReMoCap dataset for two person interactions containing full body and finger motions.
arXiv Detail & Related papers (2023-11-28T18:59:52Z)
InterControl: Zero-shot Human Interaction Generation by Controlling Every Joint [67.6297384588837]
We introduce a novel controllable motion generation method, InterControl, to encourage the synthesized motions maintaining the desired distance between joint pairs. We demonstrate that the distance between joint pairs for human-wise interactions can be generated using an off-the-shelf Large Language Model.
arXiv Detail & Related papers (2023-11-27T14:32:33Z)
NIFTY: Neural Object Interaction Fields for Guided Human Motion Synthesis [21.650091018774972]
We create a neural interaction field attached to a specific object, which outputs the distance to the valid interaction manifold given a human pose as input. This interaction field guides the sampling of an object-conditioned human motion diffusion model. We synthesize realistic motions for sitting and lifting with several objects, outperforming alternative approaches in terms of motion quality and successful action completion.
arXiv Detail & Related papers (2023-07-14T17:59:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.