ParCo: Part-Coordinating Text-to-Motion Synthesis
- URL: http://arxiv.org/abs/2403.18512v2
- Date: Tue, 23 Jul 2024 10:41:22 GMT
- Title: ParCo: Part-Coordinating Text-to-Motion Synthesis
- Authors: Qiran Zou, Shangyuan Yuan, Shian Du, Yu Wang, Chang Liu, Yi Xu, Jie Chen, Xiangyang Ji,
- Abstract summary: We propose Part-Coordinating Text-to-Motion Synthesis (ParCo)
ParCo is endowed with enhanced capabilities for understanding part motions and communication among different part motion generators.
Our approach demonstrates superior performance on common benchmarks with economic computations.
- Score: 48.67225204910634
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study a challenging task: text-to-motion synthesis, aiming to generate motions that align with textual descriptions and exhibit coordinated movements. Currently, the part-based methods introduce part partition into the motion synthesis process to achieve finer-grained generation. However, these methods encounter challenges such as the lack of coordination between different part motions and difficulties for networks to understand part concepts. Moreover, introducing finer-grained part concepts poses computational complexity challenges. In this paper, we propose Part-Coordinating Text-to-Motion Synthesis (ParCo), endowed with enhanced capabilities for understanding part motions and communication among different part motion generators, ensuring a coordinated and fined-grained motion synthesis. Specifically, we discretize whole-body motion into multiple part motions to establish the prior concept of different parts. Afterward, we employ multiple lightweight generators designed to synthesize different part motions and coordinate them through our part coordination module. Our approach demonstrates superior performance on common benchmarks with economic computations, including HumanML3D and KIT-ML, providing substantial evidence of its effectiveness. Code is available at https://github.com/qrzou/ParCo .
Related papers
- BiPO: Bidirectional Partial Occlusion Network for Text-to-Motion Synthesis [0.4893345190925178]
BiPO is a novel model that enhances text-to-motion synthesis.
It integrates part-based generation with a bidirectional autoregressive architecture.
BiPO achieves state-of-the-art performance on the HumanML3D dataset.
arXiv Detail & Related papers (2024-11-28T05:42:47Z) - KinMo: Kinematic-aware Human Motion Understanding and Generation [6.962697597686156]
Controlling human motion based on text presents an important challenge in computer vision.
Traditional approaches often rely on holistic action descriptions for motion synthesis.
We propose a novel motion representation that decomposes motion into distinct body joint group movements.
arXiv Detail & Related papers (2024-11-23T06:50:11Z) - TextIM: Part-aware Interactive Motion Synthesis from Text [25.91739105467082]
TextIM is a novel framework for synthesizing TEXT-driven human Interactive Motions.
Our approach leverages large language models, functioning as a human brain, to identify interacting human body parts.
For training and evaluation, we carefully selected and re-labeled interactive motions from HUMANML3D to develop a specialized dataset.
arXiv Detail & Related papers (2024-08-06T17:08:05Z) - FreeMotion: A Unified Framework for Number-free Text-to-Motion Synthesis [65.85686550683806]
This paper reconsiders motion generation and proposes to unify the single and multi-person motion by the conditional motion distribution.
Based on our framework, the current single-person motion spatial control method could be seamlessly integrated, achieving precise control of multi-person motion.
arXiv Detail & Related papers (2024-05-24T17:57:57Z) - GUESS:GradUally Enriching SyntheSis for Text-Driven Human Motion
Generation [23.435588151215594]
We propose a novel cascaded diffusion-based generative framework for text-driven human motion synthesis.
The framework exploits a strategy named GradUally Enriching SyntheSis (GUESS) as its abbreviation.
We show that GUESS outperforms existing state-of-the-art methods by large margins in terms of accuracy, realisticness, and diversity.
arXiv Detail & Related papers (2024-01-04T08:48:21Z) - SINC: Spatial Composition of 3D Human Motions for Simultaneous Action Generation [58.25766404147109]
Our goal is to synthesize 3D human motions given textual inputs describing simultaneous actions.
We refer to generating such simultaneous movements as performing'spatial compositions'
arXiv Detail & Related papers (2023-04-20T16:01:55Z) - TEACH: Temporal Action Composition for 3D Humans [50.97135662063117]
Given a series of natural language descriptions, our task is to generate 3D human motions that correspond semantically to the text.
In particular, our goal is to enable the synthesis of a series of actions, which we refer to as temporal action composition.
arXiv Detail & Related papers (2022-09-09T00:33:40Z) - MoDi: Unconditional Motion Synthesis from Diverse Data [51.676055380546494]
We present MoDi, an unconditional generative model that synthesizes diverse motions.
Our model is trained in a completely unsupervised setting from a diverse, unstructured and unlabeled motion dataset.
We show that despite the lack of any structure in the dataset, the latent space can be semantically clustered.
arXiv Detail & Related papers (2022-06-16T09:06:25Z) - TEMOS: Generating diverse human motions from textual descriptions [53.85978336198444]
We address the problem of generating diverse 3D human motions from textual descriptions.
We propose TEMOS, a text-conditioned generative model leveraging variational autoencoder (VAE) training with human motion data.
We show that TEMOS framework can produce both skeleton-based animations as in prior work, as well more expressive SMPL body motions.
arXiv Detail & Related papers (2022-04-25T14:53:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.