3DGen: Triplane Latent Diffusion for Textured Mesh Generation
- URL: http://arxiv.org/abs/2303.05371v2
- Date: Mon, 27 Mar 2023 18:04:20 GMT
- Title: 3DGen: Triplane Latent Diffusion for Textured Mesh Generation
- Authors: Anchit Gupta, Wenhan Xiong, Yixin Nie, Ian Jones, Barlas O\u{g}uz
- Abstract summary: A triplane VAE learns latent representations of textured meshes and a conditional diffusion model generates the triplane features.
For the first time this architecture allows conditional and unconditional generation of high quality textured or untextured 3D meshes.
It outperforms previous work substantially on image-conditioned and unconditional generation on mesh quality as well as texture generation.
- Score: 17.178939191534994
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Latent diffusion models for image generation have crossed a quality threshold
which enabled them to achieve mass adoption. Recently, a series of works have
made advancements towards replicating this success in the 3D domain,
introducing techniques such as point cloud VAE, triplane representation, neural
implicit surfaces and differentiable rendering based training. We take another
step along this direction, combining these developments in a two-step pipeline
consisting of 1) a triplane VAE which can learn latent representations of
textured meshes and 2) a conditional diffusion model which generates the
triplane features. For the first time this architecture allows conditional and
unconditional generation of high quality textured or untextured 3D meshes
across multiple diverse categories in a few seconds on a single GPU. It
outperforms previous work substantially on image-conditioned and unconditional
generation on mesh quality as well as texture generation. Furthermore, we
demonstrate the scalability of our model to large datasets for increased
quality and diversity. We will release our code and trained models.
Related papers
- DreamPolish: Domain Score Distillation With Progressive Geometry Generation [66.94803919328815]
We introduce DreamPolish, a text-to-3D generation model that excels in producing refined geometry and high-quality textures.
In the geometry construction phase, our approach leverages multiple neural representations to enhance the stability of the synthesis process.
In the texture generation phase, we introduce a novel score distillation objective, namely domain score distillation (DSD), to guide neural representations toward such a domain.
arXiv Detail & Related papers (2024-11-03T15:15:01Z) - Bootstrap3D: Improving Multi-view Diffusion Model with Synthetic Data [80.92268916571712]
A critical bottleneck is the scarcity of high-quality 3D objects with detailed captions.
We propose Bootstrap3D, a novel framework that automatically generates an arbitrary quantity of multi-view images.
We have generated 1 million high-quality synthetic multi-view images with dense descriptive captions.
arXiv Detail & Related papers (2024-05-31T17:59:56Z) - Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability [118.26563926533517]
Auto-regressive models have achieved impressive results in 2D image generation by modeling joint distributions in grid space.
We extend auto-regressive models to 3D domains, and seek a stronger ability of 3D shape generation by improving auto-regressive models at capacity and scalability simultaneously.
arXiv Detail & Related papers (2024-02-19T15:33:09Z) - Make-A-Shape: a Ten-Million-scale 3D Shape Model [52.701745578415796]
This paper introduces Make-A-Shape, a new 3D generative model designed for efficient training on a vast scale.
We first innovate a wavelet-tree representation to compactly encode shapes by formulating the subband coefficient filtering scheme.
We derive the subband adaptive training strategy to train our model to effectively learn to generate coarse and detail wavelet coefficients.
arXiv Detail & Related papers (2024-01-20T00:21:58Z) - DiffusionGAN3D: Boosting Text-guided 3D Generation and Domain Adaptation by Combining 3D GANs and Diffusion Priors [26.0337715783954]
DiffusionGAN3D boosts text-guided 3D domain adaptation and generation by combining 3D GANs and diffusion priors.
The proposed framework achieves excellent results in both domain adaptation and text-to-avatar tasks.
arXiv Detail & Related papers (2023-12-28T05:46:26Z) - Breathing New Life into 3D Assets with Generative Repainting [74.80184575267106]
Diffusion-based text-to-image models ignited immense attention from the vision community, artists, and content creators.
Recent works have proposed various pipelines powered by the entanglement of diffusion models and neural fields.
We explore the power of pretrained 2D diffusion models and standard 3D neural radiance fields as independent, standalone tools.
Our pipeline accepts any legacy renderable geometry, such as textured or untextured meshes, and orchestrates the interaction between 2D generative refinement and 3D consistency enforcement tools.
arXiv Detail & Related papers (2023-09-15T16:34:51Z) - Learning Versatile 3D Shape Generation with Improved AR Models [91.87115744375052]
Auto-regressive (AR) models have achieved impressive results in 2D image generation by modeling joint distributions in the grid space.
We propose the Improved Auto-regressive Model (ImAM) for 3D shape generation, which applies discrete representation learning based on a latent vector instead of volumetric grids.
arXiv Detail & Related papers (2023-03-26T12:03:18Z) - 3D Neural Field Generation using Triplane Diffusion [37.46688195622667]
We present an efficient diffusion-based model for 3D-aware generation of neural fields.
Our approach pre-processes training data, such as ShapeNet meshes, by converting them to continuous occupancy fields.
We demonstrate state-of-the-art results on 3D generation on several object classes from ShapeNet.
arXiv Detail & Related papers (2022-11-30T01:55:52Z) - Convolutional Generation of Textured 3D Meshes [34.20939983046376]
We propose a framework that can generate triangle meshes and associated high-resolution texture maps, using only 2D supervision from single-view natural images.
A key contribution of our work is the encoding of the mesh and texture as 2D representations, which are semantically aligned and can be easily modeled by a 2D convolutional GAN.
We demonstrate the efficacy of our method on Pascal3D+ Cars and CUB, both in an unconditional setting and in settings where the model is conditioned on class labels, attributes, and text.
arXiv Detail & Related papers (2020-06-13T15:23:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.