PrimDiffusion: Volumetric Primitives Diffusion for 3D Human Generation
- URL: http://arxiv.org/abs/2312.04559v1
- Date: Thu, 7 Dec 2023 18:59:33 GMT
- Title: PrimDiffusion: Volumetric Primitives Diffusion for 3D Human Generation
- Authors: Zhaoxi Chen, Fangzhou Hong, Haiyi Mei, Guangcong Wang, Lei Yang, Ziwei
Liu
- Abstract summary: PrimDiffusion is the first diffusion-based framework for 3D human generation.
Our framework supports real-time rendering of high-quality 3D humans at a resolution of $512times512$ once the denoising process is done.
- Score: 47.15358646320958
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present PrimDiffusion, the first diffusion-based framework for 3D human
generation. Devising diffusion models for 3D human generation is difficult due
to the intensive computational cost of 3D representations and the articulated
topology of 3D humans. To tackle these challenges, our key insight is operating
the denoising diffusion process directly on a set of volumetric primitives,
which models the human body as a number of small volumes with radiance and
kinematic information. This volumetric primitives representation marries the
capacity of volumetric representations with the efficiency of primitive-based
rendering. Our PrimDiffusion framework has three appealing properties: 1)
compact and expressive parameter space for the diffusion model, 2) flexible 3D
representation that incorporates human prior, and 3) decoder-free rendering for
efficient novel-view and novel-pose synthesis. Extensive experiments validate
that PrimDiffusion outperforms state-of-the-art methods in 3D human generation.
Notably, compared to GAN-based methods, our PrimDiffusion supports real-time
rendering of high-quality 3D humans at a resolution of $512\times512$ once the
denoising process is done. We also demonstrate the flexibility of our framework
on training-free conditional generation such as texture transfer and 3D
inpainting.
Related papers
- L3DG: Latent 3D Gaussian Diffusion [74.36431175937285]
L3DG is the first approach for generative 3D modeling of 3D Gaussians through a latent 3D Gaussian diffusion formulation.
We employ a sparse convolutional architecture to efficiently operate on room-scale scenes.
By leveraging the 3D Gaussian representation, the generated scenes can be rendered from arbitrary viewpoints in real-time.
arXiv Detail & Related papers (2024-10-17T13:19:32Z) - AnimateMe: 4D Facial Expressions via Diffusion Models [72.63383191654357]
Recent advances in diffusion models have enhanced the capabilities of generative models in 2D animation.
We employ Graph Neural Networks (GNNs) as denoising diffusion models in a novel approach, formulating the diffusion process directly on the mesh space.
This facilitates the generation of facial deformations through a mesh-diffusion-based model.
arXiv Detail & Related papers (2024-03-25T21:40:44Z) - 3DTopia: Large Text-to-3D Generation Model with Hybrid Diffusion Priors [85.11117452560882]
We present a two-stage text-to-3D generation system, namely 3DTopia, which generates high-quality general 3D assets within 5 minutes using hybrid diffusion priors.
The first stage samples from a 3D diffusion prior directly learned from 3D data. Specifically, it is powered by a text-conditioned tri-plane latent diffusion model, which quickly generates coarse 3D samples for fast prototyping.
The second stage utilizes 2D diffusion priors to further refine the texture of coarse 3D models from the first stage. The refinement consists of both latent and pixel space optimization for high-quality texture generation
arXiv Detail & Related papers (2024-03-04T17:26:28Z) - Towards Detailed Text-to-Motion Synthesis via Basic-to-Advanced
Hierarchical Diffusion Model [60.27825196999742]
We propose a novel Basic-to-Advanced Hierarchical Diffusion Model, named B2A-HDM, to collaboratively exploit low-dimensional and high-dimensional diffusion models for detailed motion synthesis.
Specifically, the basic diffusion model in low-dimensional latent space provides the intermediate denoising result that is consistent with the textual description.
The advanced diffusion model in high-dimensional latent space focuses on the following detail-enhancing denoising process.
arXiv Detail & Related papers (2023-12-18T06:30:39Z) - MVHuman: Tailoring 2D Diffusion with Multi-view Sampling For Realistic
3D Human Generation [45.88714821939144]
We present an alternative scheme named MVHuman to generate human radiance fields from text guidance.
Our core is a multi-view sampling strategy to tailor the denoising processes of the pre-trained network for generating consistent multi-view images.
arXiv Detail & Related papers (2023-12-15T11:56:26Z) - CAD: Photorealistic 3D Generation via Adversarial Distillation [28.07049413820128]
We propose a novel learning paradigm for 3D synthesis that utilizes pre-trained diffusion models.
Our method unlocks the generation of high-fidelity and photorealistic 3D content conditioned on a single image and prompt.
arXiv Detail & Related papers (2023-12-11T18:59:58Z) - StableDreamer: Taming Noisy Score Distillation Sampling for Text-to-3D [88.66678730537777]
We present StableDreamer, a methodology incorporating three advances.
First, we formalize the equivalence of the SDS generative prior and a simple supervised L2 reconstruction loss.
Second, our analysis shows that while image-space diffusion contributes to geometric precision, latent-space diffusion is crucial for vivid color rendition.
arXiv Detail & Related papers (2023-12-02T02:27:58Z) - DiffRF: Rendering-Guided 3D Radiance Field Diffusion [18.20324411024166]
We introduce DiffRF, a novel approach for 3D radiance field synthesis based on denoising diffusion probabilistic models.
In contrast to 2D-diffusion models, our model learns multi-view consistent priors, enabling free-view synthesis and accurate shape generation.
arXiv Detail & Related papers (2022-12-02T14:37:20Z) - 3D Neural Field Generation using Triplane Diffusion [37.46688195622667]
We present an efficient diffusion-based model for 3D-aware generation of neural fields.
Our approach pre-processes training data, such as ShapeNet meshes, by converting them to continuous occupancy fields.
We demonstrate state-of-the-art results on 3D generation on several object classes from ShapeNet.
arXiv Detail & Related papers (2022-11-30T01:55:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.