Purposer: Putting Human Motion Generation in Context
- URL: http://arxiv.org/abs/2404.12942v1
- Date: Fri, 19 Apr 2024 15:16:04 GMT
- Title: Purposer: Putting Human Motion Generation in Context
- Authors: Nicolas Ugrinovic, Thomas Lucas, Fabien Baradel, Philippe Weinzaepfel, Gregory Rogez, Francesc Moreno-Noguer,
- Abstract summary: We present a novel method to generate human motion to populate 3D indoor scenes.
It can be controlled with various combinations of conditioning signals such as a path in a scene, target poses, past motions, and scenes represented as 3D point clouds.
- Score: 30.706219830149504
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We present a novel method to generate human motion to populate 3D indoor scenes. It can be controlled with various combinations of conditioning signals such as a path in a scene, target poses, past motions, and scenes represented as 3D point clouds. State-of-the-art methods are either models specialized to one single setting, require vast amounts of high-quality and diverse training data, or are unconditional models that do not integrate scene or other contextual information. As a consequence, they have limited applicability and rely on costly training data. To address these limitations, we propose a new method ,dubbed Purposer, based on neural discrete representation learning. Our model is capable of exploiting, in a flexible manner, different types of information already present in open access large-scale datasets such as AMASS. First, we encode unconditional human motion into a discrete latent space. Second, an autoregressive generative model, conditioned with key contextual information, either with prompting or additive tokens, and trained for next-step prediction in this space, synthesizes sequences of latent indices. We further design a novel conditioning block to handle future conditioning information in such a causal model by using a network with two branches to compute separate stacks of features. In this manner, Purposer can generate realistic motion sequences in diverse test scenes. Through exhaustive evaluation, we demonstrate that our multi-contextual solution outperforms existing specialized approaches for specific contextual information, both in terms of quality and diversity. Our model is trained with short sequences, but a byproduct of being able to use various conditioning signals is that at test time different combinations can be used to chain short sequences together and generate long motions within a context scene.
Related papers
- DeBaRA: Denoising-Based 3D Room Arrangement Generation [22.96293773013579]
We introduce DeBaRA, a score-based model specifically tailored for precise, controllable and flexible arrangement generation in a bounded environment.
We demonstrate that by focusing on spatial attributes of objects, a single trained DeBaRA model can be leveraged at test time to perform several downstream applications such as scene synthesis, completion and re-arrangement.
arXiv Detail & Related papers (2024-09-26T23:18:25Z) - Deciphering Movement: Unified Trajectory Generation Model for Multi-Agent [53.637837706712794]
We propose a Unified Trajectory Generation model, UniTraj, that processes arbitrary trajectories as masked inputs.
Specifically, we introduce a Ghost Spatial Masking (GSM) module embedded within a Transformer encoder for spatial feature extraction.
We benchmark three practical sports game datasets, Basketball-U, Football-U, and Soccer-U, for evaluation.
arXiv Detail & Related papers (2024-05-27T22:15:23Z) - Generating Human Interaction Motions in Scenes with Text Control [66.74298145999909]
We present TeSMo, a method for text-controlled scene-aware motion generation based on denoising diffusion models.
Our approach begins with pre-training a scene-agnostic text-to-motion diffusion model.
To facilitate training, we embed annotated navigation and interaction motions within scenes.
arXiv Detail & Related papers (2024-04-16T16:04:38Z) - Learn the Force We Can: Enabling Sparse Motion Control in Multi-Object
Video Generation [26.292052071093945]
We propose an unsupervised method to generate videos from a single frame and a sparse motion input.
Our trained model can generate unseen realistic object-to-object interactions.
We show that YODA is on par with or better than state of the art video generation prior work in terms of both controllability and video quality.
arXiv Detail & Related papers (2023-06-06T19:50:02Z) - Neural Rendering of Humans in Novel View and Pose from Monocular Video [68.37767099240236]
We introduce a new method that generates photo-realistic humans under novel views and poses given a monocular video as input.
Our method significantly outperforms existing approaches under unseen poses and novel views given monocular videos as input.
arXiv Detail & Related papers (2022-04-04T03:09:20Z) - Learning Multi-Object Dynamics with Compositional Neural Radiance Fields [63.424469458529906]
We present a method to learn compositional predictive models from image observations based on implicit object encoders, Neural Radiance Fields (NeRFs), and graph neural networks.
NeRFs have become a popular choice for representing scenes due to their strong 3D prior.
For planning, we utilize RRTs in the learned latent space, where we can exploit our model and the implicit object encoder to make sampling the latent space informative and more efficient.
arXiv Detail & Related papers (2022-02-24T01:31:29Z) - Scene Synthesis via Uncertainty-Driven Attribute Synchronization [52.31834816911887]
This paper introduces a novel neural scene synthesis approach that can capture diverse feature patterns of 3D scenes.
Our method combines the strength of both neural network-based and conventional scene synthesis approaches.
arXiv Detail & Related papers (2021-08-30T19:45:07Z) - CharacterGAN: Few-Shot Keypoint Character Animation and Reposing [64.19520387536741]
We introduce CharacterGAN, a generative model that can be trained on only a few samples of a given character.
Our model generates novel poses based on keypoint locations, which can be modified in real time while providing interactive feedback.
We show that our approach outperforms recent baselines and creates realistic animations for diverse characters.
arXiv Detail & Related papers (2021-02-05T12:38:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.