Towards Detailed Text-to-Motion Synthesis via Basic-to-Advanced
Hierarchical Diffusion Model
- URL: http://arxiv.org/abs/2312.10960v1
- Date: Mon, 18 Dec 2023 06:30:39 GMT
- Title: Towards Detailed Text-to-Motion Synthesis via Basic-to-Advanced
Hierarchical Diffusion Model
- Authors: Zhenyu Xie and Yang Wu and Xuehao Gao and Zhongqian Sun and Wei Yang
and Xiaodan Liang
- Abstract summary: We propose a novel Basic-to-Advanced Hierarchical Diffusion Model, named B2A-HDM, to collaboratively exploit low-dimensional and high-dimensional diffusion models for detailed motion synthesis.
Specifically, the basic diffusion model in low-dimensional latent space provides the intermediate denoising result that is consistent with the textual description.
The advanced diffusion model in high-dimensional latent space focuses on the following detail-enhancing denoising process.
- Score: 60.27825196999742
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-guided motion synthesis aims to generate 3D human motion that not only
precisely reflects the textual description but reveals the motion details as
much as possible. Pioneering methods explore the diffusion model for
text-to-motion synthesis and obtain significant superiority. However, these
methods conduct diffusion processes either on the raw data distribution or the
low-dimensional latent space, which typically suffer from the problem of
modality inconsistency or detail-scarce. To tackle this problem, we propose a
novel Basic-to-Advanced Hierarchical Diffusion Model, named B2A-HDM, to
collaboratively exploit low-dimensional and high-dimensional diffusion models
for high quality detailed motion synthesis. Specifically, the basic diffusion
model in low-dimensional latent space provides the intermediate denoising
result that to be consistent with the textual description, while the advanced
diffusion model in high-dimensional latent space focuses on the following
detail-enhancing denoising process. Besides, we introduce a multi-denoiser
framework for the advanced diffusion model to ease the learning of
high-dimensional model and fully explore the generative potential of the
diffusion model. Quantitative and qualitative experiment results on two
text-to-motion benchmarks (HumanML3D and KIT-ML) demonstrate that B2A-HDM can
outperform existing state-of-the-art methods in terms of fidelity, modality
consistency, and diversity.
Related papers
- Diffusion Models in Low-Level Vision: A Survey [82.77962165415153]
diffusion model-based solutions have emerged as widely acclaimed for their ability to produce samples of superior quality and diversity.
We present three generic diffusion modeling frameworks and explore their correlations with other deep generative models.
We summarize extended diffusion models applied in other tasks, including medical, remote sensing, and video scenarios.
arXiv Detail & Related papers (2024-06-17T01:49:27Z) - 4Diffusion: Multi-view Video Diffusion Model for 4D Generation [55.82208863521353]
Current 4D generation methods have achieved noteworthy efficacy with the aid of advanced diffusion generative models.
We propose a novel 4D generation pipeline, namely 4Diffusion, aimed at generating spatial-temporally consistent 4D content from a monocular video.
arXiv Detail & Related papers (2024-05-31T08:18:39Z) - Neural Flow Diffusion Models: Learnable Forward Process for Improved Diffusion Modelling [2.1779479916071067]
We introduce a novel framework that enhances diffusion models by supporting a broader range of forward processes.
We also propose a novel parameterization technique for learning the forward process.
Results underscore NFDM's versatility and its potential for a wide range of applications.
arXiv Detail & Related papers (2024-04-19T15:10:54Z) - An Overview of Diffusion Models: Applications, Guided Generation, Statistical Rates and Optimization [59.63880337156392]
Diffusion models have achieved tremendous success in computer vision, audio, reinforcement learning, and computational biology.
Despite the significant empirical success, theory of diffusion models is very limited.
This paper provides a well-rounded theoretical exposure for stimulating forward-looking theories and methods of diffusion models.
arXiv Detail & Related papers (2024-04-11T14:07:25Z) - PrimDiffusion: Volumetric Primitives Diffusion for 3D Human Generation [47.15358646320958]
PrimDiffusion is the first diffusion-based framework for 3D human generation.
Our framework supports real-time rendering of high-quality 3D humans at a resolution of $512times512$ once the denoising process is done.
arXiv Detail & Related papers (2023-12-07T18:59:33Z) - Modiff: Action-Conditioned 3D Motion Generation with Denoising Diffusion
Probabilistic Models [58.357180353368896]
We propose a conditional paradigm that benefits from the denoising diffusion probabilistic model (DDPM) to tackle the problem of realistic and diverse action-conditioned 3D skeleton-based motion generation.
We are a pioneering attempt that uses DDPM to synthesize a variable number of motion sequences conditioned on a categorical action.
arXiv Detail & Related papers (2023-01-10T13:15:42Z) - Diffusion Models in Vision: A Survey [80.82832715884597]
A diffusion model is a deep generative model that is based on two stages, a forward diffusion stage and a reverse diffusion stage.
Diffusion models are widely appreciated for the quality and diversity of the generated samples, despite their known computational burdens.
arXiv Detail & Related papers (2022-09-10T22:00:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.