LaserHuman: Language-guided Scene-aware Human Motion Generation in Free Environment
- URL: http://arxiv.org/abs/2403.13307v2
- Date: Thu, 21 Mar 2024 13:06:49 GMT
- Title: LaserHuman: Language-guided Scene-aware Human Motion Generation in Free Environment
- Authors: Peishan Cong, Ziyi Wang, Zhiyang Dou, Yiming Ren, Wei Yin, Kai Cheng, Yujing Sun, Xiaoxiao Long, Xinge Zhu, Yuexin Ma,
- Abstract summary: We introduce LaserHuman, a pioneering dataset engineered to revolutionize Scene-Text-to-Motion research.
LaserHuman stands out with its inclusion of genuine human motions within 3D environments.
We propose a multi-conditional diffusion model, which is simple but effective, achieving state-of-the-art performance on existing datasets.
- Score: 27.38638713080283
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Language-guided scene-aware human motion generation has great significance for entertainment and robotics. In response to the limitations of existing datasets, we introduce LaserHuman, a pioneering dataset engineered to revolutionize Scene-Text-to-Motion research. LaserHuman stands out with its inclusion of genuine human motions within 3D environments, unbounded free-form natural language descriptions, a blend of indoor and outdoor scenarios, and dynamic, ever-changing scenes. Diverse modalities of capture data and rich annotations present great opportunities for the research of conditional motion generation, and can also facilitate the development of real-life applications. Moreover, to generate semantically consistent and physically plausible human motions, we propose a multi-conditional diffusion model, which is simple but effective, achieving state-of-the-art performance on existing datasets.
Related papers
- Multi-Condition Latent Diffusion Network for Scene-Aware Neural Human Motion Prediction [46.309401205546656]
Real-world human movements are goal-directed and highly influenced by the spatial layout of their surrounding scenes.
We propose a Multi-Condition Latent Diffusion network (MCLD) that reformulates the human motion prediction task as a multi-condition joint inference problem.
Our network achieves significant improvements over the state-of-the-art methods on both realistic and diverse predictions.
arXiv Detail & Related papers (2024-05-29T02:21:31Z) - Generating Human Interaction Motions in Scenes with Text Control [66.74298145999909]
We present TeSMo, a method for text-controlled scene-aware motion generation based on denoising diffusion models.
Our approach begins with pre-training a scene-agnostic text-to-motion diffusion model.
To facilitate training, we embed annotated navigation and interaction motions within scenes.
arXiv Detail & Related papers (2024-04-16T16:04:38Z) - Move as You Say, Interact as You Can: Language-guided Human Motion Generation with Scene Affordance [48.986552871497]
We introduce a novel two-stage framework that employs scene affordance as an intermediate representation.
By leveraging scene affordance maps, our method overcomes the difficulty in generating human motion under multimodal condition signals.
Our approach consistently outperforms all baselines on established benchmarks, including HumanML3D and HUMANISE.
arXiv Detail & Related papers (2024-03-26T18:41:07Z) - Scaling Up Dynamic Human-Scene Interaction Modeling [58.032368564071895]
TRUMANS is the most comprehensive motion-captured HSI dataset currently available.
It intricately captures whole-body human motions and part-level object dynamics.
We devise a diffusion-based autoregressive model that efficiently generates HSI sequences of any length.
arXiv Detail & Related papers (2024-03-13T15:45:04Z) - CIRCLE: Capture In Rich Contextual Environments [69.97976304918149]
We propose a novel motion acquisition system in which the actor perceives and operates in a highly contextual virtual world.
We present CIRCLE, a dataset containing 10 hours of full-body reaching motion from 5 subjects across nine scenes.
We use this dataset to train a model that generates human motion conditioned on scene information.
arXiv Detail & Related papers (2023-03-31T09:18:12Z) - HUMANISE: Language-conditioned Human Motion Generation in 3D Scenes [54.61610144668777]
We present a novel scene-and-language conditioned generative model that can produce 3D human motions in 3D scenes.
Our experiments demonstrate that our model generates diverse and semantically consistent human motions in 3D scenes.
arXiv Detail & Related papers (2022-10-18T10:14:11Z) - Scene-aware Generative Network for Human Motion Synthesis [125.21079898942347]
We propose a new framework, with the interaction between the scene and the human motion taken into account.
Considering the uncertainty of human motion, we formulate this task as a generative task.
We derive a GAN based learning approach, with discriminators to enforce the compatibility between the human motion and the contextual scene.
arXiv Detail & Related papers (2021-05-31T09:05:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.