Instant3D: Fast Text-to-3D with Sparse-View Generation and Large
Reconstruction Model
- URL: http://arxiv.org/abs/2311.06214v2
- Date: Thu, 23 Nov 2023 08:55:49 GMT
- Title: Instant3D: Fast Text-to-3D with Sparse-View Generation and Large
Reconstruction Model
- Authors: Jiahao Li, Hao Tan, Kai Zhang, Zexiang Xu, Fujun Luan, Yinghao Xu,
Yicong Hong, Kalyan Sunkavalli, Greg Shakhnarovich, Sai Bi
- Abstract summary: We propose Instant3D, a novel method that generates high-quality and diverse 3D assets from text prompts in a feed-forward manner.
Our method can generate diverse 3D assets of high visual quality within 20 seconds, two orders of magnitude faster than previous optimization-based methods.
- Score: 68.98311213582949
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text-to-3D with diffusion models has achieved remarkable progress in recent
years. However, existing methods either rely on score distillation-based
optimization which suffer from slow inference, low diversity and Janus
problems, or are feed-forward methods that generate low-quality results due to
the scarcity of 3D training data. In this paper, we propose Instant3D, a novel
method that generates high-quality and diverse 3D assets from text prompts in a
feed-forward manner. We adopt a two-stage paradigm, which first generates a
sparse set of four structured and consistent views from text in one shot with a
fine-tuned 2D text-to-image diffusion model, and then directly regresses the
NeRF from the generated images with a novel transformer-based sparse-view
reconstructor. Through extensive experiments, we demonstrate that our method
can generate diverse 3D assets of high visual quality within 20 seconds, which
is two orders of magnitude faster than previous optimization-based methods that
can take 1 to 10 hours. Our project webpage: https://jiahao.ai/instant3d/.
Related papers
- 3DTopia: Large Text-to-3D Generation Model with Hybrid Diffusion Priors [85.11117452560882]
We present a two-stage text-to-3D generation system, namely 3DTopia, which generates high-quality general 3D assets within 5 minutes using hybrid diffusion priors.
The first stage samples from a 3D diffusion prior directly learned from 3D data. Specifically, it is powered by a text-conditioned tri-plane latent diffusion model, which quickly generates coarse 3D samples for fast prototyping.
The second stage utilizes 2D diffusion priors to further refine the texture of coarse 3D models from the first stage. The refinement consists of both latent and pixel space optimization for high-quality texture generation
arXiv Detail & Related papers (2024-03-04T17:26:28Z) - ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models [65.22994156658918]
We present a method that learns to generate multi-view images in a single denoising process from real-world data.
We design an autoregressive generation that renders more 3D-consistent images at any viewpoint.
arXiv Detail & Related papers (2024-03-04T07:57:05Z) - Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D
priors [16.93758384693786]
Bidirectional Diffusion(BiDiff) is a unified framework that incorporates both a 3D and a 2D diffusion process.
Our model achieves high-quality, diverse, and scalable 3D generation.
arXiv Detail & Related papers (2023-12-07T10:00:04Z) - Instant3D: Instant Text-to-3D Generation [101.25562463919795]
We propose a novel framework for fast text-to-3D generation, dubbed Instant3D.
Instant3D is able to create a 3D object for an unseen text prompt in less than one second with a single run of a feedforward network.
arXiv Detail & Related papers (2023-11-14T18:59:59Z) - One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View
Generation and 3D Diffusion [32.29687304798145]
One-2-3-45++ is an innovative method that transforms a single image into a detailed 3D textured mesh in approximately one minute.
Our approach aims to fully harness the extensive knowledge embedded in 2D diffusion models and priors from valuable yet limited 3D data.
arXiv Detail & Related papers (2023-11-14T03:40:25Z) - Guide3D: Create 3D Avatars from Text and Image Guidance [55.71306021041785]
Guide3D is a text-and-image-guided generative model for 3D avatar generation based on diffusion models.
Our framework produces topologically and structurally correct geometry and high-resolution textures.
arXiv Detail & Related papers (2023-08-18T17:55:47Z) - One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape
Optimization [30.951405623906258]
Single image 3D reconstruction is an important but challenging task that requires extensive knowledge of our natural world.
We propose a novel method that takes a single image of any object as input and generates a full 360-degree 3D textured mesh in a single feed-forward pass.
arXiv Detail & Related papers (2023-06-29T13:28:16Z) - Magic3D: High-Resolution Text-to-3D Content Creation [78.40092800817311]
DreamFusion has recently demonstrated the utility of a pre-trained text-to-image diffusion model to optimize Neural Radiance Fields (NeRF)
In this paper, we address these limitations by utilizing a two-stage optimization framework.
Our method, dubbed Magic3D, can create high quality 3D mesh models in 40 minutes, which is 2x faster than DreamFusion.
arXiv Detail & Related papers (2022-11-18T18:59:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.