Holistic-Motion2D: Scalable Whole-body Human Motion Generation in 2D Space
- URL: http://arxiv.org/abs/2406.11253v1
- Date: Mon, 17 Jun 2024 06:31:19 GMT
- Title: Holistic-Motion2D: Scalable Whole-body Human Motion Generation in 2D Space
- Authors: Yuan Wang, Zhao Wang, Junhao Gong, Di Huang, Tong He, Wanli Ouyang, Jile Jiao, Xuetao Feng, Qi Dou, Shixiang Tang, Dan Xu,
- Abstract summary: We present $textbfHolistic-Motion2D$, the first comprehensive and large-scale benchmark for 2D whole-body motion generation.
We also highlight the utility of 2D motion for various downstream applications and its potential for lifting to 3D motion.
- Score: 78.95579123031733
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we introduce a novel path to $\textit{general}$ human motion generation by focusing on 2D space. Traditional methods have primarily generated human motions in 3D, which, while detailed and realistic, are often limited by the scope of available 3D motion data in terms of both the size and the diversity. To address these limitations, we exploit extensive availability of 2D motion data. We present $\textbf{Holistic-Motion2D}$, the first comprehensive and large-scale benchmark for 2D whole-body motion generation, which includes over 1M in-the-wild motion sequences, each paired with high-quality whole-body/partial pose annotations and textual descriptions. Notably, Holistic-Motion2D is ten times larger than the previously largest 3D motion dataset. We also introduce a baseline method, featuring innovative $\textit{whole-body part-aware attention}$ and $\textit{confidence-aware modeling}$ techniques, tailored for 2D $\underline{\text T}$ext-driv$\underline{\text{EN}}$ whole-bo$\underline{\text D}$y motion gen$\underline{\text{ER}}$ation, namely $\textbf{Tender}$. Extensive experiments demonstrate the effectiveness of $\textbf{Holistic-Motion2D}$ and $\textbf{Tender}$ in generating expressive, diverse, and realistic human motions. We also highlight the utility of 2D motion for various downstream applications and its potential for lifting to 3D motion. The page link is: https://holistic-motion2d.github.io.
Related papers
- Motion-2-to-3: Leveraging 2D Motion Data to Boost 3D Motion Generation [43.915871360698546]
2D human videos offer a vast and accessible source of motion data, covering a wider range of styles and activities.
We introduce a novel framework that disentangles local joint motion from global movements, enabling efficient learning of local motion priors from 2D data.
Our method efficiently utilizes 2D data, supporting realistic 3D human motion generation and broadening the range of motion types it supports.
arXiv Detail & Related papers (2024-12-17T17:34:52Z) - Lifting Motion to the 3D World via 2D Diffusion [19.64801640086107]
We introduce MVLift, a novel approach to predict global 3D motion using only 2D pose sequences for training.
MVLift generalizes across various domains, including human poses, human-object interactions, and animal poses.
arXiv Detail & Related papers (2024-11-27T23:26:56Z) - Director3D: Real-world Camera Trajectory and 3D Scene Generation from Text [61.9973218744157]
We introduce Director3D, a robust open-world text-to-3D generation framework, designed to generate both real-world 3D scenes and adaptive camera trajectories.
Experiments demonstrate that Director3D outperforms existing methods, offering superior performance in real-world 3D generation.
arXiv Detail & Related papers (2024-06-25T14:42:51Z) - SpatialTracker: Tracking Any 2D Pixels in 3D Space [71.58016288648447]
We propose to estimate point trajectories in 3D space to mitigate the issues caused by image projection.
Our method, named SpatialTracker, lifts 2D pixels to 3D using monocular depth estimators.
Tracking in 3D allows us to leverage as-rigid-as-possible (ARAP) constraints while simultaneously learning a rigidity embedding that clusters pixels into different rigid parts.
arXiv Detail & Related papers (2024-04-05T17:59:25Z) - Lift3D: Zero-Shot Lifting of Any 2D Vision Model to 3D [95.14469865815768]
2D vision models can be used for semantic segmentation, style transfer or scene editing, enabled by large-scale 2D image datasets.
However, extending a single 2D vision operator like scene editing to 3D typically requires a highly creative method specialized to that task.
In this paper, we propose Lift3D, which trains to predict unseen views on feature spaces generated by a few visual models.
We even outperform state-of-the-art methods specialized for the task in question.
arXiv Detail & Related papers (2024-03-27T18:13:16Z) - Realistic Human Motion Generation with Cross-Diffusion Models [30.854425772128568]
Cross Human Motion Diffusion Model (CrossDiff)
Method integrates 3D and 2D information using a shared transformer network within the training of the diffusion model.
CrossDiff effectively combines the strengths of both representations to generate more realistic motion sequences.
arXiv Detail & Related papers (2023-12-18T07:44:40Z) - M3D-VTON: A Monocular-to-3D Virtual Try-On Network [62.77413639627565]
Existing 3D virtual try-on methods mainly rely on annotated 3D human shapes and garment templates.
We propose a novel Monocular-to-3D Virtual Try-On Network (M3D-VTON) that builds on the merits of both 2D and 3D approaches.
arXiv Detail & Related papers (2021-08-11T10:05:17Z) - Deep Monocular 3D Human Pose Estimation via Cascaded Dimension-Lifting [10.336146336350811]
3D pose estimation from a single image is a challenging problem due to depth ambiguity.
One type of the previous methods lifts 2D joints, obtained by resorting to external 2D pose detectors, to the 3D space.
We propose a novel end-to-end framework that exploits the contextual information but also produces the output directly in the 3D space.
arXiv Detail & Related papers (2021-04-08T05:44:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.