Related papers: PI3D: Efficient Text-to-3D Generation with Pseudo-Image Diffusion

PI3D: Efficient Text-to-3D Generation with Pseudo-Image Diffusion

URL: http://arxiv.org/abs/2312.09069v2
Date: Sun, 21 Apr 2024 07:08:15 GMT
Title: PI3D: Efficient Text-to-3D Generation with Pseudo-Image Diffusion
Authors: Ying-Tian Liu, Yuan-Chen Guo, Guan Luo, Heyi Sun, Wei Yin, Song-Hai Zhang,
Abstract summary: We present PI3D, a framework that fully leverages the pre-trained text-to-image diffusion models' ability to generate high-quality 3D shapes from text prompts in minutes. PI3D generates a single 3D shape from text in only 3 minutes and the quality is validated to outperform existing 3D generative models by a large margin.
Score: 18.82883336156591
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diffusion models trained on large-scale text-image datasets have demonstrated a strong capability of controllable high-quality image generation from arbitrary text prompts. However, the generation quality and generalization ability of 3D diffusion models is hindered by the scarcity of high-quality and large-scale 3D datasets. In this paper, we present PI3D, a framework that fully leverages the pre-trained text-to-image diffusion models' ability to generate high-quality 3D shapes from text prompts in minutes. The core idea is to connect the 2D and 3D domains by representing a 3D shape as a set of Pseudo RGB Images. We fine-tune an existing text-to-image diffusion model to produce such pseudo-images using a small number of text-3D pairs. Surprisingly, we find that it can already generate meaningful and consistent 3D shapes given complex text descriptions. We further take the generated shapes as the starting point for a lightweight iterative refinement using score distillation sampling to achieve high-quality generation under a low budget. PI3D generates a single 3D shape from text in only 3 minutes and the quality is validated to outperform existing 3D generative models by a large margin.

Related papers

Geometry Image Diffusion: Fast and Data-Efficient Text-to-3D with Image-Based Surface Representation [2.3213238782019316]
GIMDiffusion is a novel Text-to-3D model that utilizes geometry images to efficiently represent 3D shapes using 2D images. We exploit the rich 2D priors of existing Text-to-Image models such as Stable Diffusion. In short, GIMDiffusion enables the generation of 3D assets at speeds comparable to current Text-to-Image models.
arXiv Detail & Related papers (2024-09-05T17:21:54Z)
3DTopia: Large Text-to-3D Generation Model with Hybrid Diffusion Priors [85.11117452560882]
We present a two-stage text-to-3D generation system, namely 3DTopia, which generates high-quality general 3D assets within 5 minutes using hybrid diffusion priors. The first stage samples from a 3D diffusion prior directly learned from 3D data. Specifically, it is powered by a text-conditioned tri-plane latent diffusion model, which quickly generates coarse 3D samples for fast prototyping. The second stage utilizes 2D diffusion priors to further refine the texture of coarse 3D models from the first stage. The refinement consists of both latent and pixel space optimization for high-quality texture generation
arXiv Detail & Related papers (2024-03-04T17:26:28Z)
ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models [65.22994156658918]
We present a method that learns to generate multi-view images in a single denoising process from real-world data. We design an autoregressive generation that renders more 3D-consistent images at any viewpoint.
arXiv Detail & Related papers (2024-03-04T07:57:05Z)
TPA3D: Triplane Attention for Fast Text-to-3D Generation [28.33270078863519]
We propose Triplane Attention for text-guided 3D generation (TPA3D) TPA3D is an end-to-end trainable GAN-based deep learning model for fast text-to-3D generation. We show that TPA3D generates high-quality 3D textured shapes aligned with fine-grained descriptions.
arXiv Detail & Related papers (2023-12-05T10:39:37Z)
GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models [102.22388340738536]
2D and 3D diffusion models can generate decent 3D objects based on prompts. 3D diffusion models have good 3D consistency, but their quality and generalization are limited as trainable 3D data is expensive and hard to obtain. This paper attempts to bridge the power from the two types of diffusion models via the recent explicit and efficient 3D Gaussian splatting representation.
arXiv Detail & Related papers (2023-10-12T17:22:24Z)
EfficientDreamer: High-Fidelity and Robust 3D Creation via Orthogonal-view Diffusion Prior [59.25950280610409]
We propose a robust high-quality 3D content generation pipeline by exploiting orthogonal-view image guidance. In this paper, we introduce a novel 2D diffusion model that generates an image consisting of four sub-images based on the given text prompt. We also present a 3D synthesis network that can further improve the details of the generated 3D contents.
arXiv Detail & Related papers (2023-08-25T07:39:26Z)
Guide3D: Create 3D Avatars from Text and Image Guidance [55.71306021041785]
Guide3D is a text-and-image-guided generative model for 3D avatar generation based on diffusion models. Our framework produces topologically and structurally correct geometry and high-resolution textures.
arXiv Detail & Related papers (2023-08-18T17:55:47Z)
Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models [44.34479731617561]
We introduce explicit 3D shape priors into the CLIP-guided 3D optimization process. We present a simple yet effective approach that directly bridges the text and image modalities with a powerful text-to-image diffusion model. Our method, Dream3D, is capable of generating imaginative 3D content with superior visual quality and shape accuracy.
arXiv Detail & Related papers (2022-12-28T18:23:47Z)
3D-TOGO: Towards Text-Guided Cross-Category 3D Object Generation [107.46972849241168]
3D-TOGO model generates 3D objects in the form of the neural radiance field with good texture. Experiments on the largest 3D object dataset (i.e., ABO) are conducted to verify that 3D-TOGO can better generate high-quality 3D objects.
arXiv Detail & Related papers (2022-12-02T11:31:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.