Motion-X: A Large-scale 3D Expressive Whole-body Human Motion Dataset
- URL: http://arxiv.org/abs/2307.00818v2
- Date: Fri, 26 Jan 2024 15:40:29 GMT
- Title: Motion-X: A Large-scale 3D Expressive Whole-body Human Motion Dataset
- Authors: Jing Lin, Ailing Zeng, Shunlin Lu, Yuanhao Cai, Ruimao Zhang, Haoqian
Wang, Lei Zhang
- Abstract summary: Motion-X is a large-scale 3D expressive whole-body motion dataset.
It comprises 15.6M precise 3D whole-body pose annotations (i.e., SMPL-X) covering 81.1K motion sequences from massive scenes.
Motion-X provides 15.6M frame-level whole-body pose descriptions and 81.1K sequence-level semantic labels.
- Score: 40.54625833855793
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we present Motion-X, a large-scale 3D expressive whole-body
motion dataset. Existing motion datasets predominantly contain body-only poses,
lacking facial expressions, hand gestures, and fine-grained pose descriptions.
Moreover, they are primarily collected from limited laboratory scenes with
textual descriptions manually labeled, which greatly limits their scalability.
To overcome these limitations, we develop a whole-body motion and text
annotation pipeline, which can automatically annotate motion from either
single- or multi-view videos and provide comprehensive semantic labels for each
video and fine-grained whole-body pose descriptions for each frame. This
pipeline is of high precision, cost-effective, and scalable for further
research. Based on it, we construct Motion-X, which comprises 15.6M precise 3D
whole-body pose annotations (i.e., SMPL-X) covering 81.1K motion sequences from
massive scenes. Besides, Motion-X provides 15.6M frame-level whole-body pose
descriptions and 81.1K sequence-level semantic labels. Comprehensive
experiments demonstrate the accuracy of the annotation pipeline and the
significant benefit of Motion-X in enhancing expressive, diverse, and natural
motion generation, as well as 3D whole-body human mesh recovery.
Related papers
- FineMotion: A Dataset and Benchmark with both Spatial and Temporal Annotation for Fine-grained Motion Generation and Editing [36.42160163142448]
We propose the FineMotion dataset, which contains over 442,000 human motion snippets.<n>The dataset includes about 95k detailed paragraphs describing the movements of human body parts of entire motion sequences.
arXiv Detail & Related papers (2025-07-26T07:54:29Z) - Motion-X++: A Large-Scale Multimodal 3D Whole-body Human Motion Dataset [35.47253826828815]
Motion-X++ is a large-scale multimodal 3D expressive whole-body human motion dataset.
Motion-X++ provides 19.5M 3D whole-body pose annotations covering 120.5K motion sequences from massive scenes.
arXiv Detail & Related papers (2025-01-09T09:37:27Z) - Motion Prompting: Controlling Video Generation with Motion Trajectories [57.049252242807874]
We train a video generation model conditioned on sparse or dense video trajectories.
We translate high-level user requests into detailed, semi-dense motion prompts.
We demonstrate our approach through various applications, including camera and object motion control, "interacting" with an image, motion transfer, and image editing.
arXiv Detail & Related papers (2024-12-03T18:59:56Z) - MotionBank: A Large-scale Video Motion Benchmark with Disentangled Rule-based Annotations [85.85596165472663]
We build MotionBank, which comprises 13 video action datasets, 1.24M motion sequences, and 132.9M frames of natural and diverse human motions.
Our MotionBank is beneficial for general motion-related tasks of human motion generation, motion in-context generation, and motion understanding.
arXiv Detail & Related papers (2024-10-17T17:31:24Z) - Scaling Large Motion Models with Million-Level Human Motions [67.40066387326141]
We present MotionLib, the first million-level dataset for motion generation.<n>We train a large motion model named projname, demonstrating robust performance across a wide range of human activities.
arXiv Detail & Related papers (2024-10-04T10:48:54Z) - Holistic-Motion2D: Scalable Whole-body Human Motion Generation in 2D Space [78.95579123031733]
We present $textbfHolistic-Motion2D$, the first comprehensive and large-scale benchmark for 2D whole-body motion generation.
We also highlight the utility of 2D motion for various downstream applications and its potential for lifting to 3D motion.
arXiv Detail & Related papers (2024-06-17T06:31:19Z) - Motion Generation from Fine-grained Textual Descriptions [29.033358642532722]
We build a large-scale language-motion dataset specializing in fine-grained textual descriptions, FineHumanML3D.
We design a new text2motion model, FineMotionDiffuse, making full use of fine-grained textual information.
Our evaluation shows that FineMotionDiffuse trained on FineHumanML3D improves FID by a large margin of 0.38, compared with competitive baselines.
arXiv Detail & Related papers (2024-03-20T11:38:30Z) - MotionScript: Natural Language Descriptions for Expressive 3D Human Motions [8.050271017133076]
We introduce MotionScript, a novel framework for generating highly detailed, natural language descriptions of 3D human motions.
MotionScript provides fine-grained, structured descriptions that capture the full complexity of human movement.
MotionScript serves as both a descriptive tool and a training resource for text-to-motion models.
arXiv Detail & Related papers (2023-12-19T22:33:17Z) - Act As You Wish: Fine-Grained Control of Motion Diffusion Model with
Hierarchical Semantic Graphs [31.244039305932287]
We propose hierarchical semantic graphs for fine-grained control over motion generation.
We disentangle motion descriptions into hierarchical semantic graphs including three levels of motions, actions, and specifics.
Our method can continuously refine the generated motion, which may have a far-reaching impact on the community.
arXiv Detail & Related papers (2023-11-02T06:20:23Z) - TapMo: Shape-aware Motion Generation of Skeleton-free Characters [64.83230289993145]
We present TapMo, a Text-driven Animation Pipeline for Motion in a broad spectrum of skeleton-free 3D characters.
TapMo comprises two main components - Mesh Handle Predictor and Shape-aware Diffusion Module.
arXiv Detail & Related papers (2023-10-19T12:14:32Z) - Universal Humanoid Motion Representations for Physics-Based Control [71.46142106079292]
We present a universal motion representation that encompasses a comprehensive range of motor skills for physics-based humanoid control.
We first learn a motion imitator that can imitate all of human motion from a large, unstructured motion dataset.
We then create our motion representation by distilling skills directly from the imitator.
arXiv Detail & Related papers (2023-10-06T20:48:43Z) - MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model [35.32967411186489]
MotionDiffuse is a diffusion model-based text-driven motion generation framework.
It excels at modeling complicated data distribution and generating vivid motion sequences.
It responds to fine-grained instructions on body parts, and arbitrary-length motion synthesis with time-varied text prompts.
arXiv Detail & Related papers (2022-08-31T17:58:54Z) - Render In-between: Motion Guided Video Synthesis for Action
Interpolation [53.43607872972194]
We propose a motion-guided frame-upsampling framework that is capable of producing realistic human motion and appearance.
A novel motion model is trained to inference the non-linear skeletal motion between frames by leveraging a large-scale motion-capture dataset.
Our pipeline only requires low-frame-rate videos and unpaired human motion data but does not require high-frame-rate videos for training.
arXiv Detail & Related papers (2021-11-01T15:32:51Z) - SportsCap: Monocular 3D Human Motion Capture and Fine-grained
Understanding in Challenging Sports Videos [40.19723456533343]
We propose SportsCap -- the first approach for simultaneously capturing 3D human motions and understanding fine-grained actions from monocular challenging sports video input.
Our approach utilizes the semantic and temporally structured sub-motion prior in the embedding space for motion capture and understanding.
Based on such hybrid motion information, we introduce a multi-stream spatial-temporal Graph Convolutional Network(ST-GCN) to predict the fine-grained semantic action attributes.
arXiv Detail & Related papers (2021-04-23T07:52:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.