Generating Attribute-Aware Human Motions from Textual Prompt
- URL: http://arxiv.org/abs/2506.21912v1
- Date: Fri, 27 Jun 2025 04:56:54 GMT
- Title: Generating Attribute-Aware Human Motions from Textual Prompt
- Authors: Xinghan Wang, Kun Xu, Fei Li, Cao Sheng, Jiazhong Yu, Yadong Mu,
- Abstract summary: We conceptualize each motion as comprising both attribute information and action semantics.<n>A new framework inspired by Structural Causal Models is proposed to decouple action semantics from human attributes.<n>The resulting model is capable of generating realistic, attribute-aware motion aligned with the user's text and attribute inputs.
- Score: 28.57025886368254
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-driven human motion generation has recently attracted considerable attention, allowing models to generate human motions based on textual descriptions. However, current methods neglect the influence of human attributes (such as age, gender, weight, and height) which are key factors shaping human motion patterns. This work represents a pilot exploration for bridging this gap. We conceptualize each motion as comprising both attribute information and action semantics, where textual descriptions align exclusively with action semantics. To achieve this, a new framework inspired by Structural Causal Models is proposed to decouple action semantics from human attributes, enabling text-to-semantics prediction and attribute-controlled generation. The resulting model is capable of generating realistic, attribute-aware motion aligned with the user's text and attribute inputs. For evaluation, we introduce HumanAttr, a comprehensive dataset containing attribute annotations for text-motion pairs, setting the first benchmark for attribute-aware text-to-motion generation. Extensive experiments on the new dataset validate our model's effectiveness.
Related papers
- A Quantitative Evaluation of the Expressivity of BMI, Pose and Gender in Body Embeddings for Recognition and Identification [56.10719736365069]
We extend the notion of expressivity, defined as the mutual information between learned features and specific attributes, to quantify how strongly attributes are encoded.<n>We find that BMI consistently shows the highest expressivity in the final layers, indicating its dominant role in recognition.<n>These findings demonstrate the central role of body attributes in ReID and establish a principled approach for uncovering attribute driven correlations.
arXiv Detail & Related papers (2025-03-09T05:15:54Z) - Adaptive Prototype Model for Attribute-based Multi-label Few-shot Action Recognition [11.316708754749103]
In real-world action recognition systems, incorporating more attributes helps achieve a more comprehensive understanding of human behavior.<n>We propose a novel method i.e. Adaptive Attribute Prototype Model (AAPM) for human action recognition, which captures rich action-relevant attribute information.<n>Our AAPM achieves the state-of-the-art performance in both attribute-based multi-label few-shot action recognition and single-label few-shot action recognition.
arXiv Detail & Related papers (2025-02-18T06:39:28Z) - TypeScore: A Text Fidelity Metric for Text-to-Image Generative Models [39.06617653124486]
We introduce a new evaluation framework called TypeScore to assess a model's ability to generate images with high-fidelity embedded text.
Our proposed metric demonstrates greater resolution than CLIPScore to differentiate popular image generation models.
arXiv Detail & Related papers (2024-11-02T07:56:54Z) - Learning Generalizable Human Motion Generator with Reinforcement Learning [95.62084727984808]
Text-driven human motion generation is one of the vital tasks in computer-aided content creation.
Existing methods often overfit specific motion expressions in the training data, hindering their ability to generalize.
We present textbfInstructMotion, which incorporate the trail and error paradigm in reinforcement learning for generalizable human motion generation.
arXiv Detail & Related papers (2024-05-24T13:29:12Z) - SemanticBoost: Elevating Motion Generation with Augmented Textual Cues [73.83255805408126]
Our framework comprises a Semantic Enhancement module and a Context-Attuned Motion Denoiser (CAMD)
The CAMD approach provides an all-encompassing solution for generating high-quality, semantically consistent motion sequences.
Our experimental results demonstrate that SemanticBoost, as a diffusion-based method, outperforms auto-regressive-based techniques.
arXiv Detail & Related papers (2023-10-31T09:58:11Z) - Make-An-Animation: Large-Scale Text-conditional 3D Human Motion
Generation [47.272177594990104]
We introduce Make-An-Animation, a text-conditioned human motion generation model.
It learns more diverse poses and prompts from large-scale image-text datasets.
It reaches state-of-the-art performance on text-to-motion generation.
arXiv Detail & Related papers (2023-05-16T17:58:43Z) - Real or Fake Text?: Investigating Human Ability to Detect Boundaries
Between Human-Written and Machine-Generated Text [23.622347443796183]
We study a more realistic setting where text begins as human-written and transitions to being generated by state-of-the-art neural language models.
We show that, while annotators often struggle at this task, there is substantial variance in annotator skill and that given proper incentives, annotators can improve at this task over time.
arXiv Detail & Related papers (2022-12-24T06:40:25Z) - 3d human motion generation from the text via gesture action
classification and the autoregressive model [28.76063248241159]
The model focuses on generating special gestures that express human thinking, such as waving and nodding.
With several experiments, the proposed method successfully generates perceptually natural and realistic 3D human motion from the text.
arXiv Detail & Related papers (2022-11-18T03:05:49Z) - TEMOS: Generating diverse human motions from textual descriptions [53.85978336198444]
We address the problem of generating diverse 3D human motions from textual descriptions.
We propose TEMOS, a text-conditioned generative model leveraging variational autoencoder (VAE) training with human motion data.
We show that TEMOS framework can produce both skeleton-based animations as in prior work, as well more expressive SMPL body motions.
arXiv Detail & Related papers (2022-04-25T14:53:06Z) - Attribute Alignment: Controlling Text Generation from Pre-trained
Language Models [46.19190007510232]
We propose a simple and flexible method for controlling text generation by aligning disentangled attribute representations.
In contrast to recent efforts on training a discriminator to perturb the token level distribution for an attribute, we use the same data to learn an alignment function to guide the pre-trained, non-controlled language model to generate texts with the target attribute without changing the original language model parameters.
arXiv Detail & Related papers (2021-03-20T01:51:32Z) - Procedural Reading Comprehension with Attribute-Aware Context Flow [85.34405161075276]
Procedural texts often describe processes that happen over entities.
We introduce an algorithm for procedural reading comprehension by translating the text into a general formalism.
arXiv Detail & Related papers (2020-03-31T00:06:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.