Related papers: Ponimator: Unfolding Interactive Pose for Versatile Human-human Interaction Animation

Ponimator: Unfolding Interactive Pose for Versatile Human-human Interaction Animation

URL: http://arxiv.org/abs/2510.14976v1
Date: Thu, 16 Oct 2025 17:59:56 GMT
Title: Ponimator: Unfolding Interactive Pose for Versatile Human-human Interaction Animation
Authors: Shaowei Liu, Chuan Guo, Bing Zhou, Jian Wang,
Abstract summary: Close-proximity human-human interactive poses convey rich contextual information about interaction dynamics.<n>We propose Ponimator, a framework anchored on interactive poses for versatile interaction animation.<n>Ponimator supports diverse tasks, including image-based interaction animation, reaction animation, and text-to-interaction synthesis.
Score: 14.555640323663438
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Close-proximity human-human interactive poses convey rich contextual information about interaction dynamics. Given such poses, humans can intuitively infer the context and anticipate possible past and future dynamics, drawing on strong priors of human behavior. Inspired by this observation, we propose Ponimator, a simple framework anchored on proximal interactive poses for versatile interaction animation. Our training data consists of close-contact two-person poses and their surrounding temporal context from motion-capture interaction datasets. Leveraging interactive pose priors, Ponimator employs two conditional diffusion models: (1) a pose animator that uses the temporal prior to generate dynamic motion sequences from interactive poses, and (2) a pose generator that applies the spatial prior to synthesize interactive poses from a single pose, text, or both when interactive poses are unavailable. Collectively, Ponimator supports diverse tasks, including image-based interaction animation, reaction animation, and text-to-interaction synthesis, facilitating the transfer of interaction knowledge from high-quality mocap data to open-world scenarios. Empirical experiments across diverse datasets and applications demonstrate the universality of the pose prior and the effectiveness and robustness of our framework.

Related papers

Interact2Ar: Full-Body Human-Human Interaction Generation via Autoregressive Diffusion Models [80.28579390566298]
We introduce Interact2Ar, a text-conditioned autoregressive diffusion model for generating full-body, human-human interactions.<n>Hand kinematics are incorporated through dedicated parallel branches, enabling high-fidelity full-body generation.<n>Our model enables a series of downstream applications, including temporal motion composition, real-time adaptation to disturbances, and extension beyond dyadic to multi-person scenarios.
arXiv Detail & Related papers (2025-12-22T18:59:50Z)
Fine-grained text-driven dual-human motion generation via dynamic hierarchical interaction [31.055662466004254]
We propose a fine-grained dual-human motion generation method, namely FineDual, to model dynamic hierarchical interaction.<n>The first stage, Self-Learning Stage, divides the dual-human overall text into individual texts.<n>The second stage, Adaptive Adjustment Stage, predicts interaction distance by an interaction distance predictor.<n>The last stage, Teacher-Guided Refinement Stage, utilizes overall text features as guidance to refine motion features at the overall level.
arXiv Detail & Related papers (2025-10-09T14:18:53Z)
MoReact: Generating Reactive Motion from Textual Descriptions [57.642436102978245]
MoReact is a diffusion-based method designed to disentangle the generation of global trajectories and local motions sequentially.<n>Our experiments, utilizing data adapted from a two-person motion dataset, demonstrate the efficacy of our approach.
arXiv Detail & Related papers (2025-09-28T14:31:41Z)
InterAct: A Large-Scale Dataset of Dynamic, Expressive and Interactive Activities between Two People in Daily Scenarios [40.42003202491803]
We propose to simultaneously model two people's activities, and target objective-driven, dynamic, and semantically consistent interactions.<n>We capture a new multi-modal dataset dubbed InterAct composed of 241 motion sequences.<n>InterAct contains diverse and complex motions of individuals and interesting and relatively long-term interaction patterns barely seen before.
arXiv Detail & Related papers (2025-09-06T15:36:47Z)
InterAnimate: Taming Region-aware Diffusion Model for Realistic Human Interaction Animation [47.103725372531784]
We introduce a novel motion paradigm for animating realistic hand-face interactions.<n>Our approach simultaneously learns anatomically-temporal contact dynamics and biomechanically plausible deformation effects.<n>Results show InterAnimate produces highly realistic animations, setting a new benchmark.
arXiv Detail & Related papers (2025-04-15T06:32:45Z)
ReMoS: 3D Motion-Conditioned Reaction Synthesis for Two-Person Interactions [66.87211993793807]
We present ReMoS, a denoising diffusion based model that synthesizes full body motion of a person in two person interaction scenario. We demonstrate ReMoS across challenging two person scenarios such as pair dancing, Ninjutsu, kickboxing, and acrobatics. We also contribute the ReMoCap dataset for two person interactions containing full body and finger motions.
arXiv Detail & Related papers (2023-11-28T18:59:52Z)
InterControl: Zero-shot Human Interaction Generation by Controlling Every Joint [67.6297384588837]
We introduce a novel controllable motion generation method, InterControl, to encourage the synthesized motions maintaining the desired distance between joint pairs. We demonstrate that the distance between joint pairs for human-wise interactions can be generated using an off-the-shelf Large Language Model.
arXiv Detail & Related papers (2023-11-27T14:32:33Z)
GRIP: Generating Interaction Poses Using Spatial Cues and Latent Consistency [57.9920824261925]
Hands are dexterous and highly versatile manipulators that are central to how humans interact with objects and their environment. modeling realistic hand-object interactions is critical for applications in computer graphics, computer vision, and mixed reality. GRIP is a learning-based method that takes as input the 3D motion of the body and the object, and synthesizes realistic motion for both hands before, during, and after object interaction.
arXiv Detail & Related papers (2023-08-22T17:59:51Z)
Interaction Transformer for Human Reaction Generation [61.22481606720487]
We propose a novel interaction Transformer (InterFormer) consisting of a Transformer network with both temporal and spatial attentions. Our method is general and can be used to generate more complex and long-term interactions.
arXiv Detail & Related papers (2022-07-04T19:30:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.