OmniControl: Control Any Joint at Any Time for Human Motion Generation
- URL: http://arxiv.org/abs/2310.08580v2
- Date: Sun, 14 Apr 2024 22:23:18 GMT
- Title: OmniControl: Control Any Joint at Any Time for Human Motion Generation
- Authors: Yiming Xie, Varun Jampani, Lei Zhong, Deqing Sun, Huaizu Jiang,
- Abstract summary: We present a novel approach named OmniControl for incorporating flexible spatial control signals into a text-conditioned human motion generation model.
We propose analytic spatial guidance that ensures the generated motion can tightly conform to the input control signals.
At the same time, realism guidance is introduced to refine all the joints to generate more coherent motion.
- Score: 46.293854851116215
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a novel approach named OmniControl for incorporating flexible spatial control signals into a text-conditioned human motion generation model based on the diffusion process. Unlike previous methods that can only control the pelvis trajectory, OmniControl can incorporate flexible spatial control signals over different joints at different times with only one model. Specifically, we propose analytic spatial guidance that ensures the generated motion can tightly conform to the input control signals. At the same time, realism guidance is introduced to refine all the joints to generate more coherent motion. Both the spatial and realism guidance are essential and they are highly complementary for balancing control accuracy and motion realism. By combining them, OmniControl generates motions that are realistic, coherent, and consistent with the spatial constraints. Experiments on HumanML3D and KIT-ML datasets show that OmniControl not only achieves significant improvement over state-of-the-art methods on pelvis control but also shows promising results when incorporating the constraints over other joints.
Related papers
- ControlMM: Controllable Masked Motion Generation [38.16884934336603]
We propose ControlMM, a novel approach incorporating spatial control signals into the generative masked motion model.
ControlMM achieves real-time, high-fidelity, and high-precision controllable motion generation simultaneously.
ControlMM generates motions 20 times faster than diffusion-based methods.
arXiv Detail & Related papers (2024-10-14T17:50:27Z) - Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation [52.87672306545577]
Existing motion generation methods primarily focus on the direct synthesis of global motions.
We propose the local action-guided motion diffusion model, which facilitates global motion generation by utilizing local actions as fine-grained control signals.
Our method provides flexibility in seamlessly combining various local actions and continuous guiding weight adjustment.
arXiv Detail & Related papers (2024-07-15T08:35:00Z) - MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model [29.93359157128045]
This work introduces MotionLCM, extending controllable motion generation to a real-time level.
We first propose the motion latent consistency model (MotionLCM) for motion generation, building upon the latent diffusion model.
By adopting one-step (or few-step) inference, we further improve the runtime efficiency of the motion latent diffusion model for motion generation.
arXiv Detail & Related papers (2024-04-30T17:59:47Z) - TLControl: Trajectory and Language Control for Human Motion Synthesis [68.09806223962323]
We present TLControl, a novel method for realistic human motion synthesis.
It incorporates both low-level Trajectory and high-level Language semantics controls.
It is practical for interactive and high-quality animation generation.
arXiv Detail & Related papers (2023-11-28T18:54:16Z) - InterControl: Zero-shot Human Interaction Generation by Controlling Every Joint [67.6297384588837]
We introduce a novel controllable motion generation method, InterControl, to encourage the synthesized motions maintaining the desired distance between joint pairs.
We demonstrate that the distance between joint pairs for human-wise interactions can be generated using an off-the-shelf Large Language Model.
arXiv Detail & Related papers (2023-11-27T14:32:33Z) - Cocktail: Mixing Multi-Modality Controls for Text-Conditional Image
Generation [79.8881514424969]
Text-conditional diffusion models are able to generate high-fidelity images with diverse contents.
However, linguistic representations frequently exhibit ambiguous descriptions of the envisioned objective imagery.
We propose Cocktail, a pipeline to mix various modalities into one embedding.
arXiv Detail & Related papers (2023-06-01T17:55:32Z) - Guided Motion Diffusion for Controllable Human Motion Synthesis [18.660523853430497]
We propose Guided Motion Diffusion (GMD), a method that incorporates spatial constraints into the motion generation process.
Specifically, we propose an effective feature projection scheme that manipulates motion representation to enhance the coherency between spatial information and local poses.
Our experiments justify the development of GMD, which achieves a significant improvement over state-of-the-art methods in text-based motion generation.
arXiv Detail & Related papers (2023-05-21T21:54:31Z) - Controllable Text Generation via Probability Density Estimation in the
Latent Space [16.962510129437558]
We propose a novel control framework using probability density estimation in the latent space.
Our method utilizes an invertible transformation function, the Normalizing Flow, that maps the complex distributions in the latent space to simple Gaussian distributions in the prior space.
Experiments on single-attribute controls and multi-attribute control reveal that our method outperforms several strong baselines on attribute relevance and text quality.
arXiv Detail & Related papers (2022-12-16T07:11:18Z) - MoDi: Unconditional Motion Synthesis from Diverse Data [51.676055380546494]
We present MoDi, an unconditional generative model that synthesizes diverse motions.
Our model is trained in a completely unsupervised setting from a diverse, unstructured and unlabeled motion dataset.
We show that despite the lack of any structure in the dataset, the latent space can be semantically clustered.
arXiv Detail & Related papers (2022-06-16T09:06:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.