Related papers: InterGen: Diffusion-based Multi-human Motion Generation under Complex Interactions

InterGen: Diffusion-based Multi-human Motion Generation under Complex Interactions

URL: http://arxiv.org/abs/2304.05684v3
Date: Thu, 28 Mar 2024 03:15:57 GMT
Title: InterGen: Diffusion-based Multi-human Motion Generation under Complex Interactions
Authors: Han Liang, Wenqian Zhang, Wenxuan Li, Jingyi Yu, Lan Xu,
Abstract summary: We present InterGen, an effective diffusion-based approach that incorporates human-to-human interactions into the motion diffusion process. We first contribute a multimodal dataset, named InterHuman. It consists of about 107M frames for diverse two-person interactions, with accurate skeletal motions and 23,337 natural language descriptions. We propose a novel representation for motion input in our interaction diffusion model, which explicitly formulates the global relations between the two performers in the world frame.
Score: 49.097973114627344
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We have recently seen tremendous progress in diffusion advances for generating realistic human motions. Yet, they largely disregard the multi-human interactions. In this paper, we present InterGen, an effective diffusion-based approach that incorporates human-to-human interactions into the motion diffusion process, which enables layman users to customize high-quality two-person interaction motions, with only text guidance. We first contribute a multimodal dataset, named InterHuman. It consists of about 107M frames for diverse two-person interactions, with accurate skeletal motions and 23,337 natural language descriptions. For the algorithm side, we carefully tailor the motion diffusion model to our two-person interaction setting. To handle the symmetry of human identities during interactions, we propose two cooperative transformer-based denoisers that explicitly share weights, with a mutual attention mechanism to further connect the two denoising processes. Then, we propose a novel representation for motion input in our interaction diffusion model, which explicitly formulates the global relations between the two performers in the world frame. We further introduce two novel regularization terms to encode spatial relations, equipped with a corresponding damping scheme during the training of our interaction diffusion model. Extensive experiments validate the effectiveness and generalizability of InterGen. Notably, it can generate more diverse and compelling two-person motions than previous methods and enables various downstream applications for human interactions.

Related papers

Multi-Person Interaction Generation from Two-Person Motion Priors [7.253302825595181]
Graph-driven Interaction Sampling is a method that can generate realistic and diverse multi-person interactions.<n>We decompose the generation task into simultaneous single-person motion generation conditioned on one other's motion.<n>Our approach consistently outperforms existing methods in reducing artifacts when generating a wide range of two-person and multi-person interactions.
arXiv Detail & Related papers (2025-05-23T13:13:00Z)
InterDance:Reactive 3D Dance Generation with Realistic Duet Interactions [67.37790144477503]
We propose InterDance, a large-scale duet dance dataset that significantly enhances motion quality, data scale, and the variety of dance genres. We introduce a diffusion-based framework with an interaction refinement guidance strategy to optimize the realism of interactions progressively.
arXiv Detail & Related papers (2024-12-22T11:53:51Z)
Two-in-One: Unified Multi-Person Interactive Motion Generation by Latent Diffusion Transformer [24.166147954731652]
Multi-person interactive motion generation is a critical yet under-explored domain in computer character animation. Current research often employs separate module branches for individual motions, leading to a loss of interaction information. We propose a novel, unified approach that models multi-person motions and their interactions within a single latent space.
arXiv Detail & Related papers (2024-12-21T15:35:50Z)
It Takes Two: Real-time Co-Speech Two-person's Interaction Generation via Reactive Auto-regressive Diffusion Model [34.94330722832987]
We introduce an audio-driven, auto-regressive system designed to synthesize dynamic movements for two characters during a conversation. To the best of our knowledge, this is the first system capable of generating interactive full-body motions for two characters from speech in an online manner.
arXiv Detail & Related papers (2024-12-03T12:31:44Z)
Versatile Motion Language Models for Multi-Turn Interactive Agents [28.736843383405603]
We introduce Versatile Interactive Motion language model, which integrates both language and motion modalities. We evaluate the versatility of our method across motion-related tasks, motion to text, text to motion, reaction generation, motion editing, and reasoning about motion sequences.
arXiv Detail & Related papers (2024-10-08T02:23:53Z)
in2IN: Leveraging individual Information to Generate Human INteractions [29.495166514135295]
We introduce in2IN, a novel diffusion model for human-human motion generation conditioned on individual descriptions. We also propose DualMDM, a model composition technique that combines the motions generated with in2IN and the motions generated by a single-person motion prior pre-trained on HumanML3D.
arXiv Detail & Related papers (2024-04-15T17:59:04Z)
THOR: Text to Human-Object Interaction Diffusion via Relation Intervention [51.02435289160616]
We propose a novel Text-guided Human-Object Interaction diffusion model with Relation Intervention (THOR) In each diffusion step, we initiate text-guided human and object motion and then leverage human-object relations to intervene in object motion. We construct Text-BEHAVE, a Text2HOI dataset that seamlessly integrates textual descriptions with the currently largest publicly available 3D HOI dataset.
arXiv Detail & Related papers (2024-03-17T13:17:25Z)
ReMoS: 3D Motion-Conditioned Reaction Synthesis for Two-Person Interactions [66.87211993793807]
We present ReMoS, a denoising diffusion based model that synthesizes full body motion of a person in two person interaction scenario. We demonstrate ReMoS across challenging two person scenarios such as pair dancing, Ninjutsu, kickboxing, and acrobatics. We also contribute the ReMoCap dataset for two person interactions containing full body and finger motions.
arXiv Detail & Related papers (2023-11-28T18:59:52Z)
InterControl: Zero-shot Human Interaction Generation by Controlling Every Joint [67.6297384588837]
We introduce a novel controllable motion generation method, InterControl, to encourage the synthesized motions maintaining the desired distance between joint pairs. We demonstrate that the distance between joint pairs for human-wise interactions can be generated using an off-the-shelf Large Language Model.
arXiv Detail & Related papers (2023-11-27T14:32:33Z)
Persistent-Transient Duality: A Multi-mechanism Approach for Modeling Human-Object Interaction [58.67761673662716]
Humans are highly adaptable, swiftly switching between different modes to handle different tasks, situations and contexts. In Human-object interaction (HOI) activities, these modes can be attributed to two mechanisms: (1) the large-scale consistent plan for the whole activity and (2) the small-scale children interactive actions that start and end along the timeline. This work proposes to model two concurrent mechanisms that jointly control human motion.
arXiv Detail & Related papers (2023-07-24T12:21:33Z)
Bipartite Graph Diffusion Model for Human Interaction Generation [11.732108478773196]
We introduce a novel bipartite graph diffusion method (BiGraphDiff) to generate human motion interactions between two persons. We show that the proposed achieves new state-of-the-art results on leading benchmarks for the human interaction generation task.
arXiv Detail & Related papers (2023-01-24T16:59:46Z)
Interaction Transformer for Human Reaction Generation [61.22481606720487]
We propose a novel interaction Transformer (InterFormer) consisting of a Transformer network with both temporal and spatial attentions. Our method is general and can be used to generate more complex and long-term interactions.
arXiv Detail & Related papers (2022-07-04T19:30:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.