Fg-T2M: Fine-Grained Text-Driven Human Motion Generation via Diffusion
Model
- URL: http://arxiv.org/abs/2309.06284v1
- Date: Tue, 12 Sep 2023 14:43:47 GMT
- Title: Fg-T2M: Fine-Grained Text-Driven Human Motion Generation via Diffusion
Model
- Authors: Yin Wang, Zhiying Leng, Frederick W. B. Li, Shun-Cheng Wu, Xiaohui
Liang
- Abstract summary: We propose a fine-grained method for generating high-quality, conditional human motion sequences supporting precise text description.
Our approach consists of two key components: 1) a linguistics-structure assisted module that constructs accurate and complete language feature to fully utilize text information; and 2) a context-aware progressive reasoning module that learns neighborhood and overall semantic linguistics features from shallow and deep graph neural networks to achieve a multi-step inference.
- Score: 11.873294782380984
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-driven human motion generation in computer vision is both significant
and challenging. However, current methods are limited to producing either
deterministic or imprecise motion sequences, failing to effectively control the
temporal and spatial relationships required to conform to a given text
description. In this work, we propose a fine-grained method for generating
high-quality, conditional human motion sequences supporting precise text
description. Our approach consists of two key components: 1) a
linguistics-structure assisted module that constructs accurate and complete
language feature to fully utilize text information; and 2) a context-aware
progressive reasoning module that learns neighborhood and overall semantic
linguistics features from shallow and deep graph neural networks to achieve a
multi-step inference. Experiments show that our approach outperforms
text-driven motion generation methods on HumanML3D and KIT test sets and
generates better visually confirmed motion to the text conditions.
Related papers
- Text2Grasp: Grasp synthesis by text prompts of object grasping parts [4.031699584957737]
The hand plays a pivotal role in human ability to grasp and manipulate objects.
Existing methods that use human intention or task-level language as control signals for grasping inherently face ambiguity.
We propose a grasp synthesis method guided by text prompts of object grasping parts, Text2Grasp, which provides more precise control.
arXiv Detail & Related papers (2024-04-09T10:57:27Z) - Text2Data: Low-Resource Data Generation with Textual Control [104.38011760992637]
Natural language serves as a common and straightforward control signal for humans to interact seamlessly with machines.
We propose Text2Data, a novel approach that utilizes unlabeled data to understand the underlying data distribution through an unsupervised diffusion model.
It undergoes controllable finetuning via a novel constraint optimization-based learning objective that ensures controllability and effectively counteracts catastrophic forgetting.
arXiv Detail & Related papers (2024-02-08T03:41:39Z) - BOTH2Hands: Inferring 3D Hands from Both Text Prompts and Body Dynamics [50.88842027976421]
We propose BOTH57M, a novel multi-modal dataset for two-hand motion generation.
Our dataset includes accurate motion tracking for the human body and hands.
We also provide a strong baseline method, BOTH2Hands, for the novel task.
arXiv Detail & Related papers (2023-12-13T07:30:19Z) - SemanticBoost: Elevating Motion Generation with Augmented Textual Cues [73.83255805408126]
Our framework comprises a Semantic Enhancement module and a Context-Attuned Motion Denoiser (CAMD)
The CAMD approach provides an all-encompassing solution for generating high-quality, semantically consistent motion sequences.
Our experimental results demonstrate that SemanticBoost, as a diffusion-based method, outperforms auto-regressive-based techniques.
arXiv Detail & Related papers (2023-10-31T09:58:11Z) - HumanTOMATO: Text-aligned Whole-body Motion Generation [30.729975715600627]
This work targets a novel text-driven whole-body motion generation task.
It aims at generating high-quality, diverse, and coherent facial expressions, hand gestures, and body motions simultaneously.
arXiv Detail & Related papers (2023-10-19T17:59:46Z) - DiverseMotion: Towards Diverse Human Motion Generation via Discrete
Diffusion [70.33381660741861]
We present DiverseMotion, a new approach for synthesizing high-quality human motions conditioned on textual descriptions.
We show that our DiverseMotion achieves the state-of-the-art motion quality and competitive motion diversity.
arXiv Detail & Related papers (2023-09-04T05:43:48Z) - AttT2M: Text-Driven Human Motion Generation with Multi-Perspective
Attention Mechanism [24.049207982022214]
We propose textbftT2M, a two-stage method with multi-perspective attention mechanism.
Our method outperforms the current state-of-the-art in terms of qualitative and quantitative evaluation.
arXiv Detail & Related papers (2023-09-02T02:18:17Z) - TM2T: Stochastic and Tokenized Modeling for the Reciprocal Generation of
3D Human Motions and Texts [20.336481832461168]
Inspired by the strong ties between vision and language, our paper aims to explore the generation of 3D human full-body motions from texts.
We propose the use of motion token, a discrete and compact motion representation.
Our approach is flexible, could be used for both text2motion and motion2text tasks.
arXiv Detail & Related papers (2022-07-04T19:52:18Z) - TEMOS: Generating diverse human motions from textual descriptions [53.85978336198444]
We address the problem of generating diverse 3D human motions from textual descriptions.
We propose TEMOS, a text-conditioned generative model leveraging variational autoencoder (VAE) training with human motion data.
We show that TEMOS framework can produce both skeleton-based animations as in prior work, as well more expressive SMPL body motions.
arXiv Detail & Related papers (2022-04-25T14:53:06Z) - Improving Disentangled Text Representation Learning with
Information-Theoretic Guidance [99.68851329919858]
discrete nature of natural language makes disentangling of textual representations more challenging.
Inspired by information theory, we propose a novel method that effectively manifests disentangled representations of text.
Experiments on both conditional text generation and text-style transfer demonstrate the high quality of our disentangled representation.
arXiv Detail & Related papers (2020-06-01T03:36:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.