3D-CLFusion: Fast Text-to-3D Rendering with Contrastive Latent Diffusion
- URL: http://arxiv.org/abs/2303.11938v2
- Date: Wed, 20 Dec 2023 07:12:06 GMT
- Title: 3D-CLFusion: Fast Text-to-3D Rendering with Contrastive Latent Diffusion
- Authors: Yu-Jhe Li, Tao Xu, Ji Hou, Bichen Wu, Xiaoliang Dai, Albert Pumarola,
Peizhao Zhang, Peter Vajda, Kris Kitani
- Abstract summary: We tackle the task of text-to-3D creation with pre-trained latent-based NeRFs (NeRFs that generate 3D objects given input latent code)
We propose a novel method named 3D-CLFusion which leverages the pre-trained latent-based NeRFs and performs fast 3D content creation in less than a minute.
- Score: 55.71215821923401
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We tackle the task of text-to-3D creation with pre-trained latent-based NeRFs
(NeRFs that generate 3D objects given input latent code). Recent works such as
DreamFusion and Magic3D have shown great success in generating 3D content using
NeRFs and text prompts, but the current approach of optimizing a NeRF for every
text prompt is 1) extremely time-consuming and 2) often leads to low-resolution
outputs. To address these challenges, we propose a novel method named
3D-CLFusion which leverages the pre-trained latent-based NeRFs and performs
fast 3D content creation in less than a minute. In particular, we introduce a
latent diffusion prior network for learning the w latent from the input CLIP
text/image embeddings. This pipeline allows us to produce the w latent without
further optimization during inference and the pre-trained NeRF is able to
perform multi-view high-resolution 3D synthesis based on the latent. We note
that the novelty of our model lies in that we introduce contrastive learning
during training the diffusion prior which enables the generation of the valid
view-invariant latent code. We demonstrate through experiments the
effectiveness of our proposed view-invariant diffusion process for fast
text-to-3D creation, e.g., 100 times faster than DreamFusion. We note that our
model is able to serve as the role of a plug-and-play tool for text-to-3D with
pre-trained NeRFs.
Related papers
- 3DTopia: Large Text-to-3D Generation Model with Hybrid Diffusion Priors [85.11117452560882]
We present a two-stage text-to-3D generation system, namely 3DTopia, which generates high-quality general 3D assets within 5 minutes using hybrid diffusion priors.
The first stage samples from a 3D diffusion prior directly learned from 3D data. Specifically, it is powered by a text-conditioned tri-plane latent diffusion model, which quickly generates coarse 3D samples for fast prototyping.
The second stage utilizes 2D diffusion priors to further refine the texture of coarse 3D models from the first stage. The refinement consists of both latent and pixel space optimization for high-quality texture generation
arXiv Detail & Related papers (2024-03-04T17:26:28Z) - Prompt2NeRF-PIL: Fast NeRF Generation via Pretrained Implicit Latent [61.56387277538849]
This paper explores promptable NeRF generation for direct conditioning and fast generation of NeRF parameters for the underlying 3D scenes.
Prompt2NeRF-PIL is capable of generating a variety of 3D objects with a single forward pass.
We will show that our approach speeds up the text-to-NeRF model DreamFusion and the 3D reconstruction speed of the image-to-NeRF method Zero-1-to-3 by 3 to 5 times.
arXiv Detail & Related papers (2023-12-05T08:32:46Z) - Spice-E : Structural Priors in 3D Diffusion using Cross-Entity Attention [9.52027244702166]
Spice-E is a neural network that adds structural guidance to 3D diffusion models.
We show that our approach supports a variety of applications, including 3D stylization, semantic shape editing and text-conditional abstraction-to-3D.
arXiv Detail & Related papers (2023-11-29T17:36:49Z) - TextMesh: Generation of Realistic 3D Meshes From Text Prompts [56.2832907275291]
We propose a novel method for generation of highly realistic-looking 3D meshes.
To this end, we extend NeRF to employ an SDF backbone, leading to improved 3D mesh extraction.
arXiv Detail & Related papers (2023-04-24T20:29:41Z) - DITTO-NeRF: Diffusion-based Iterative Text To Omni-directional 3D Model [15.091263190886337]
We propose a novel pipeline to generate a high-quality 3D NeRF model from a text prompt or a single image.
DitTO-NeRF consists of constructing high-quality partial 3D object for limited in-boundary (IB) angles using the given or text-generated 2D image from the frontal view.
We propose progressive 3D object reconstruction schemes in terms of scales (low to high resolution), angles (IB angles initially to outer-boundary (OB) later, and masks (object to background boundary) in our DITTO-NeRF.
arXiv Detail & Related papers (2023-04-06T02:27:22Z) - Magic3D: High-Resolution Text-to-3D Content Creation [78.40092800817311]
DreamFusion has recently demonstrated the utility of a pre-trained text-to-image diffusion model to optimize Neural Radiance Fields (NeRF)
In this paper, we address these limitations by utilizing a two-stage optimization framework.
Our method, dubbed Magic3D, can create high quality 3D mesh models in 40 minutes, which is 2x faster than DreamFusion.
arXiv Detail & Related papers (2022-11-18T18:59:59Z) - Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures [72.44361273600207]
We adapt the score distillation to the publicly available, and computationally efficient, Latent Diffusion Models.
Latent Diffusion Models apply the entire diffusion process in a compact latent space of a pretrained autoencoder.
We show that latent score distillation can be successfully applied directly on 3D meshes.
arXiv Detail & Related papers (2022-11-14T18:25:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.