Related papers: TIMotion: Temporal and Interactive Framework for Efficient Human-Human Motion Generation

TIMotion: Temporal and Interactive Framework for Efficient Human-Human Motion Generation

URL: http://arxiv.org/abs/2408.17135v4
Date: Fri, 28 Mar 2025 03:47:30 GMT
Title: TIMotion: Temporal and Interactive Framework for Efficient Human-Human Motion Generation
Authors: Yabiao Wang, Shuo Wang, Jiangning Zhang, Ke Fan, Jiafu Wu, Zhucun Xue, Yong Liu,
Abstract summary: Current methods fall into two main categories: single-person-based methods and separate modeling-based methods.<n>We introduce TIMotion (Temporal and Interactive Modeling), an efficient and effective framework for human-human motion generation.
Score: 30.734182958106327
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Human-human motion generation is essential for understanding humans as social beings. Current methods fall into two main categories: single-person-based methods and separate modeling-based methods. To delve into this field, we abstract the overall generation process into a general framework MetaMotion, which consists of two phases: temporal modeling and interaction mixing. For temporal modeling, the single-person-based methods concatenate two people into a single one directly, while the separate modeling-based methods skip the modeling of interaction sequences. The inadequate modeling described above resulted in sub-optimal performance and redundant model parameters. In this paper, we introduce TIMotion (Temporal and Interactive Modeling), an efficient and effective framework for human-human motion generation. Specifically, we first propose Causal Interactive Injection to model two separate sequences as a causal sequence leveraging the temporal and causal properties. Then we present Role-Evolving Scanning to adjust to the change in the active and passive roles throughout the interaction. Finally, to generate smoother and more rational motion, we design Localized Pattern Amplification to capture short-term motion patterns. Extensive experiments on InterHuman and InterX demonstrate that our method achieves superior performance. Project page: https://aigc-explorer.github.io/TIMotion-page/

Related papers

Multi-Person Interaction Generation from Two-Person Motion Priors [7.253302825595181]
Graph-driven Interaction Sampling is a method that can generate realistic and diverse multi-person interactions.<n>We decompose the generation task into simultaneous single-person motion generation conditioned on one other's motion.<n>Our approach consistently outperforms existing methods in reducing artifacts when generating a wide range of two-person and multi-person interactions.
arXiv Detail & Related papers (2025-05-23T13:13:00Z)
GENMO: A GENeralist Model for Human MOtion [64.16188966024542]
We present GENMO, a unified Generalist Model for Human Motion that bridges motion estimation and generation in a single framework.<n>Our key insight is to reformulate motion estimation as constrained motion generation, where the output motion must precisely satisfy observed conditioning signals.<n>Our novel architecture handles variable-length motions and mixed multimodal conditions (text, audio, video) at different time intervals, offering flexible control.
arXiv Detail & Related papers (2025-05-02T17:59:55Z)
in2IN: Leveraging individual Information to Generate Human INteractions [29.495166514135295]
We introduce in2IN, a novel diffusion model for human-human motion generation conditioned on individual descriptions. We also propose DualMDM, a model composition technique that combines the motions generated with in2IN and the motions generated by a single-person motion prior pre-trained on HumanML3D.
arXiv Detail & Related papers (2024-04-15T17:59:04Z)
Multi-agent Long-term 3D Human Pose Forecasting via Interaction-aware Trajectory Conditioning [41.09061877498741]
We propose an interaction-aware trajectory-conditioned long-term multi-agent human pose forecasting model. Our model effectively handles the multi-modality of human motion and the complexity of long-term multi-agent interactions.
arXiv Detail & Related papers (2024-04-08T06:15:13Z)
A Decoupled Spatio-Temporal Framework for Skeleton-based Action Segmentation [89.86345494602642]
Existing methods are limited in weak-temporal modeling capability. We propose a Decoupled Scoupled Framework (DeST) to address the issues. DeST significantly outperforms current state-of-the-art methods with less computational complexity.
arXiv Detail & Related papers (2023-12-10T09:11:39Z)
InterControl: Zero-shot Human Interaction Generation by Controlling Every Joint [67.6297384588837]
We introduce a novel controllable motion generation method, InterControl, to encourage the synthesized motions maintaining the desired distance between joint pairs. We demonstrate that the distance between joint pairs for human-wise interactions can be generated using an off-the-shelf Large Language Model.
arXiv Detail & Related papers (2023-11-27T14:32:33Z)
Persistent-Transient Duality: A Multi-mechanism Approach for Modeling Human-Object Interaction [58.67761673662716]
Humans are highly adaptable, swiftly switching between different modes to handle different tasks, situations and contexts. In Human-object interaction (HOI) activities, these modes can be attributed to two mechanisms: (1) the large-scale consistent plan for the whole activity and (2) the small-scale children interactive actions that start and end along the timeline. This work proposes to model two concurrent mechanisms that jointly control human motion.
arXiv Detail & Related papers (2023-07-24T12:21:33Z)
InterGen: Diffusion-based Multi-human Motion Generation under Complex Interactions [49.097973114627344]
We present InterGen, an effective diffusion-based approach that incorporates human-to-human interactions into the motion diffusion process. We first contribute a multimodal dataset, named InterHuman. It consists of about 107M frames for diverse two-person interactions, with accurate skeletal motions and 23,337 natural language descriptions. We propose a novel representation for motion input in our interaction diffusion model, which explicitly formulates the global relations between the two performers in the world frame.
arXiv Detail & Related papers (2023-04-12T08:12:29Z)
Task-Oriented Human-Object Interactions Generation with Implicit Neural Representations [61.659439423703155]
TOHO: Task-Oriented Human-Object Interactions Generation with Implicit Neural Representations. Our method generates continuous motions that are parameterized only by the temporal coordinate. This work takes a step further toward general human-scene interaction simulation.
arXiv Detail & Related papers (2023-03-23T09:31:56Z)
Human Motion Diffusion as a Generative Prior [20.004837564647367]
We introduce three forms of composition based on diffusion priors. We tackle the challenge of long sequence generation. Using parallel composition, we show promising steps toward two-person generation.
arXiv Detail & Related papers (2023-03-02T17:09:27Z)
Bipartite Graph Diffusion Model for Human Interaction Generation [11.732108478773196]
We introduce a novel bipartite graph diffusion method (BiGraphDiff) to generate human motion interactions between two persons. We show that the proposed achieves new state-of-the-art results on leading benchmarks for the human interaction generation task.
arXiv Detail & Related papers (2023-01-24T16:59:46Z)
Pretrained Diffusion Models for Unified Human Motion Synthesis [33.41816844381057]
MoFusion is a framework for unified motion synthesis. It employs a Transformer backbone to ease the inclusion of diverse control signals. It also supports multi-granularity synthesis ranging from motion completion of a body part to whole-body motion generation.
arXiv Detail & Related papers (2022-12-06T09:19:21Z)
MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model [35.32967411186489]
MotionDiffuse is a diffusion model-based text-driven motion generation framework. It excels at modeling complicated data distribution and generating vivid motion sequences. It responds to fine-grained instructions on body parts, and arbitrary-length motion synthesis with time-varied text prompts.
arXiv Detail & Related papers (2022-08-31T17:58:54Z)
Hierarchical Style-based Networks for Motion Synthesis [150.226137503563]
We propose a self-supervised method for generating long-range, diverse and plausible behaviors to achieve a specific goal location. Our proposed method learns to model the motion of human by decomposing a long-range generation task in a hierarchical manner. On large-scale skeleton dataset, we show that the proposed method is able to synthesise long-range, diverse and plausible motion.
arXiv Detail & Related papers (2020-08-24T02:11:02Z)
Perpetual Motion: Generating Unbounded Human Motion [61.40259979876424]
We focus on long-term prediction; that is, generating long sequences of human motion that is plausible. We propose a model to generate non-deterministic, textitever-changing, perpetual human motion. We train this using a heavy-tailed function of the KL divergence of a white-noise Gaussian process, allowing latent sequence temporal dependency.
arXiv Detail & Related papers (2020-07-27T21:50:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.