Magic3D: High-Resolution Text-to-3D Content Creation
- URL: http://arxiv.org/abs/2211.10440v2
- Date: Sat, 25 Mar 2023 17:32:25 GMT
- Title: Magic3D: High-Resolution Text-to-3D Content Creation
- Authors: Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng,
Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, Tsung-Yi Lin
- Abstract summary: DreamFusion has recently demonstrated the utility of a pre-trained text-to-image diffusion model to optimize Neural Radiance Fields (NeRF)
In this paper, we address these limitations by utilizing a two-stage optimization framework.
Our method, dubbed Magic3D, can create high quality 3D mesh models in 40 minutes, which is 2x faster than DreamFusion.
- Score: 78.40092800817311
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: DreamFusion has recently demonstrated the utility of a pre-trained
text-to-image diffusion model to optimize Neural Radiance Fields (NeRF),
achieving remarkable text-to-3D synthesis results. However, the method has two
inherent limitations: (a) extremely slow optimization of NeRF and (b)
low-resolution image space supervision on NeRF, leading to low-quality 3D
models with a long processing time. In this paper, we address these limitations
by utilizing a two-stage optimization framework. First, we obtain a coarse
model using a low-resolution diffusion prior and accelerate with a sparse 3D
hash grid structure. Using the coarse representation as the initialization, we
further optimize a textured 3D mesh model with an efficient differentiable
renderer interacting with a high-resolution latent diffusion model. Our method,
dubbed Magic3D, can create high quality 3D mesh models in 40 minutes, which is
2x faster than DreamFusion (reportedly taking 1.5 hours on average), while also
achieving higher resolution. User studies show 61.7% raters to prefer our
approach over DreamFusion. Together with the image-conditioned generation
capabilities, we provide users with new ways to control 3D synthesis, opening
up new avenues to various creative applications.
Related papers
- Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image [28.759158325097093]
Unique3D is a novel image-to-3D framework for efficiently generating high-quality 3D meshes from single-view images.
Our framework features state-of-the-art generation fidelity and strong generalizability.
arXiv Detail & Related papers (2024-05-30T17:59:54Z) - Compress3D: a Compressed Latent Space for 3D Generation from a Single Image [27.53099431097921]
Triplane autoencoder encodes 3D models into a compact triplane latent space to compress both the 3D geometry and texture information.
We introduce a 3D-aware cross-attention mechanism, which utilizes low-resolution latent representations to query features from a high-resolution 3D feature volume.
Our approach enables the generation of high-quality 3D assets in merely 7 seconds on a single A100 GPU.
arXiv Detail & Related papers (2024-03-20T11:51:04Z) - LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation [73.36690511083894]
This paper introduces a novel framework called LN3Diff to address a unified 3D diffusion pipeline.
Our approach harnesses a 3D-aware architecture and variational autoencoder to encode the input image into a structured, compact, and 3D latent space.
It achieves state-of-the-art performance on ShapeNet for 3D generation and demonstrates superior performance in monocular 3D reconstruction and conditional 3D generation.
arXiv Detail & Related papers (2024-03-18T17:54:34Z) - 3DTopia: Large Text-to-3D Generation Model with Hybrid Diffusion Priors [85.11117452560882]
We present a two-stage text-to-3D generation system, namely 3DTopia, which generates high-quality general 3D assets within 5 minutes using hybrid diffusion priors.
The first stage samples from a 3D diffusion prior directly learned from 3D data. Specifically, it is powered by a text-conditioned tri-plane latent diffusion model, which quickly generates coarse 3D samples for fast prototyping.
The second stage utilizes 2D diffusion priors to further refine the texture of coarse 3D models from the first stage. The refinement consists of both latent and pixel space optimization for high-quality texture generation
arXiv Detail & Related papers (2024-03-04T17:26:28Z) - Instant3D: Fast Text-to-3D with Sparse-View Generation and Large
Reconstruction Model [68.98311213582949]
We propose Instant3D, a novel method that generates high-quality and diverse 3D assets from text prompts in a feed-forward manner.
Our method can generate diverse 3D assets of high visual quality within 20 seconds, two orders of magnitude faster than previous optimization-based methods.
arXiv Detail & Related papers (2023-11-10T18:03:44Z) - TextMesh: Generation of Realistic 3D Meshes From Text Prompts [56.2832907275291]
We propose a novel method for generation of highly realistic-looking 3D meshes.
To this end, we extend NeRF to employ an SDF backbone, leading to improved 3D mesh extraction.
arXiv Detail & Related papers (2023-04-24T20:29:41Z) - 3D-CLFusion: Fast Text-to-3D Rendering with Contrastive Latent Diffusion [55.71215821923401]
We tackle the task of text-to-3D creation with pre-trained latent-based NeRFs (NeRFs that generate 3D objects given input latent code)
We propose a novel method named 3D-CLFusion which leverages the pre-trained latent-based NeRFs and performs fast 3D content creation in less than a minute.
arXiv Detail & Related papers (2023-03-21T15:38:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.