Point-E: A System for Generating 3D Point Clouds from Complex Prompts
- URL: http://arxiv.org/abs/2212.08751v1
- Date: Fri, 16 Dec 2022 23:22:59 GMT
- Title: Point-E: A System for Generating 3D Point Clouds from Complex Prompts
- Authors: Alex Nichol, Heewoo Jun, Prafulla Dhariwal, Pamela Mishkin, Mark Chen
- Abstract summary: In this paper, we explore an alternative method for 3D object generation which produces 3D models in only 1-2 minutes on a single GPU.
Our method first generates a single synthetic view using a text-to-image diffusion model, and then produces a 3D point cloud using a second diffusion model which conditions on the generated image.
While our method still falls short of the state-of-the-art in terms of sample quality, it is one to two orders of magnitude faster to sample from, offering a practical trade-off for some use cases.
- Score: 15.872304376606223
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While recent work on text-conditional 3D object generation has shown
promising results, the state-of-the-art methods typically require multiple
GPU-hours to produce a single sample. This is in stark contrast to
state-of-the-art generative image models, which produce samples in a number of
seconds or minutes. In this paper, we explore an alternative method for 3D
object generation which produces 3D models in only 1-2 minutes on a single GPU.
Our method first generates a single synthetic view using a text-to-image
diffusion model, and then produces a 3D point cloud using a second diffusion
model which conditions on the generated image. While our method still falls
short of the state-of-the-art in terms of sample quality, it is one to two
orders of magnitude faster to sample from, offering a practical trade-off for
some use cases. We release our pre-trained point cloud diffusion models, as
well as evaluation code and models, at https://github.com/openai/point-e.
Related papers
- SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images [49.7344030427291]
We study the problem of single-image 3D object reconstruction.
Recent works have diverged into two directions: regression-based modeling and generative modeling.
We present SPAR3D, a novel two-stage approach aiming to take the best of both directions.
arXiv Detail & Related papers (2025-01-08T18:52:03Z) - Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation [45.95218923564575]
We propose a novel single-stage 3D diffusion model, DiffusionGS, for object and scene generation from a single view.
Experiments show that our method enjoys better generation quality (2.20 dB higher in PSNR and 23.25 lower in FID) and over 5x faster speed (6s on an A100 GPU) than SOTA methods.
arXiv Detail & Related papers (2024-11-21T18:21:24Z) - Isotropic3D: Image-to-3D Generation Based on a Single CLIP Embedding [16.50466940644004]
We present Isotropic3D, an image-to-3D generation pipeline that takes only an image CLIP embedding as input.
Isotropic3D allows the optimization to be isotropic w.r.t. the azimuth angle by solely resting on the SDS loss.
arXiv Detail & Related papers (2024-03-15T15:27:58Z) - 3DTopia: Large Text-to-3D Generation Model with Hybrid Diffusion Priors [85.11117452560882]
We present a two-stage text-to-3D generation system, namely 3DTopia, which generates high-quality general 3D assets within 5 minutes using hybrid diffusion priors.
The first stage samples from a 3D diffusion prior directly learned from 3D data. Specifically, it is powered by a text-conditioned tri-plane latent diffusion model, which quickly generates coarse 3D samples for fast prototyping.
The second stage utilizes 2D diffusion priors to further refine the texture of coarse 3D models from the first stage. The refinement consists of both latent and pixel space optimization for high-quality texture generation
arXiv Detail & Related papers (2024-03-04T17:26:28Z) - Instant3D: Fast Text-to-3D with Sparse-View Generation and Large
Reconstruction Model [68.98311213582949]
We propose Instant3D, a novel method that generates high-quality and diverse 3D assets from text prompts in a feed-forward manner.
Our method can generate diverse 3D assets of high visual quality within 20 seconds, two orders of magnitude faster than previous optimization-based methods.
arXiv Detail & Related papers (2023-11-10T18:03:44Z) - GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models [102.22388340738536]
2D and 3D diffusion models can generate decent 3D objects based on prompts.
3D diffusion models have good 3D consistency, but their quality and generalization are limited as trainable 3D data is expensive and hard to obtain.
This paper attempts to bridge the power from the two types of diffusion models via the recent explicit and efficient 3D Gaussian splatting representation.
arXiv Detail & Related papers (2023-10-12T17:22:24Z) - Viewset Diffusion: (0-)Image-Conditioned 3D Generative Models from 2D
Data [76.38261311948649]
Viewset Diffusion is a diffusion-based generator that outputs 3D objects while only using multi-view 2D data for supervision.
We train a diffusion model to generate viewsets, but design the neural network generator to reconstruct internally corresponding 3D models.
The model performs reconstruction efficiently, in a feed-forward manner, and is trained using only rendering losses using as few as three views per viewset.
arXiv Detail & Related papers (2023-06-13T16:18:51Z) - Shap-E: Generating Conditional 3D Implicit Functions [7.603750555294962]
Shap-E is a conditional generative model for 3D assets.
We train Shap-E in two stages: first, we train an encoder that deterministically maps 3D assets into the parameters of an implicit function.
When trained on a large dataset of paired 3D and text data, our resulting models are capable of generating complex and diverse 3D assets in a matter of seconds.
arXiv Detail & Related papers (2023-05-03T23:59:13Z) - Novel View Synthesis with Diffusion Models [56.55571338854636]
We present 3DiM, a diffusion model for 3D novel view synthesis.
It is able to translate a single input view into consistent and sharp completions across many views.
3DiM can generate multiple views that are 3D consistent using a novel technique called conditioning.
arXiv Detail & Related papers (2022-10-06T16:59:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.