Related papers: Point-E: A System for Generating 3D Point Clouds from Complex Prompts

Point-E: A System for Generating 3D Point Clouds from Complex Prompts

URL: http://arxiv.org/abs/2212.08751v1
Date: Fri, 16 Dec 2022 23:22:59 GMT
Title: Point-E: A System for Generating 3D Point Clouds from Complex Prompts
Authors: Alex Nichol, Heewoo Jun, Prafulla Dhariwal, Pamela Mishkin, Mark Chen
Abstract summary: In this paper, we explore an alternative method for 3D object generation which produces 3D models in only 1-2 minutes on a single GPU. Our method first generates a single synthetic view using a text-to-image diffusion model, and then produces a 3D point cloud using a second diffusion model which conditions on the generated image. While our method still falls short of the state-of-the-art in terms of sample quality, it is one to two orders of magnitude faster to sample from, offering a practical trade-off for some use cases.
Score: 15.872304376606223
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While recent work on text-conditional 3D object generation has shown promising results, the state-of-the-art methods typically require multiple GPU-hours to produce a single sample. This is in stark contrast to state-of-the-art generative image models, which produce samples in a number of seconds or minutes. In this paper, we explore an alternative method for 3D object generation which produces 3D models in only 1-2 minutes on a single GPU. Our method first generates a single synthetic view using a text-to-image diffusion model, and then produces a 3D point cloud using a second diffusion model which conditions on the generated image. While our method still falls short of the state-of-the-art in terms of sample quality, it is one to two orders of magnitude faster to sample from, offering a practical trade-off for some use cases. We release our pre-trained point cloud diffusion models, as well as evaluation code and models, at https://github.com/openai/point-e.

Related papers

A Recipe for Generating 3D Worlds From a Single Image [28.396381735501524]
We introduce a recipe for generating immersive 3D worlds from a single image. This approach requires minimal training and uses existing generative models. Tested on both synthetic and real images, our method produces high-quality 3D environments suitable for VR display.
arXiv Detail & Related papers (2025-03-20T18:06:12Z)
SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images [49.7344030427291]
We study the problem of single-image 3D object reconstruction. Recent works have diverged into two directions: regression-based modeling and generative modeling. We present SPAR3D, a novel two-stage approach aiming to take the best of both directions.
arXiv Detail & Related papers (2025-01-08T18:52:03Z)
Any-to-3D Generation via Hybrid Diffusion Supervision [67.54197818071464]
XBind is a unified framework for any-to-3D generation using cross-modal pre-alignment techniques. XBind integrates an multimodal-aligned encoder with pre-trained diffusion models to generate 3D objects from any modalities.
arXiv Detail & Related papers (2024-11-22T03:52:37Z)
Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation [45.95218923564575]
We propose a novel single-stage 3D diffusion model, DiffusionGS, for object and scene generation from a single view. Experiments show that our method enjoys better generation quality (2.20 dB higher in PSNR and 23.25 lower in FID) and over 5x faster speed (6s on an A100 GPU) than SOTA methods.
arXiv Detail & Related papers (2024-11-21T18:21:24Z)
Sampling 3D Gaussian Scenes in Seconds with Latent Diffusion Models [3.9373541926236766]
We present a latent diffusion model over 3D scenes, that can be trained using only 2D image data. We show that our approach enables generating 3D scenes in as little as 0.2 seconds, either from scratch, or from sparse input views.
arXiv Detail & Related papers (2024-06-18T23:14:29Z)
Isotropic3D: Image-to-3D Generation Based on a Single CLIP Embedding [16.50466940644004]
We present Isotropic3D, an image-to-3D generation pipeline that takes only an image CLIP embedding as input. Isotropic3D allows the optimization to be isotropic w.r.t. the azimuth angle by solely resting on the SDS loss.
arXiv Detail & Related papers (2024-03-15T15:27:58Z)
3DTopia: Large Text-to-3D Generation Model with Hybrid Diffusion Priors [85.11117452560882]
We present a two-stage text-to-3D generation system, namely 3DTopia, which generates high-quality general 3D assets within 5 minutes using hybrid diffusion priors. The first stage samples from a 3D diffusion prior directly learned from 3D data. Specifically, it is powered by a text-conditioned tri-plane latent diffusion model, which quickly generates coarse 3D samples for fast prototyping. The second stage utilizes 2D diffusion priors to further refine the texture of coarse 3D models from the first stage. The refinement consists of both latent and pixel space optimization for high-quality texture generation
arXiv Detail & Related papers (2024-03-04T17:26:28Z)
Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model [68.98311213582949]
We propose Instant3D, a novel method that generates high-quality and diverse 3D assets from text prompts in a feed-forward manner. Our method can generate diverse 3D assets of high visual quality within 20 seconds, two orders of magnitude faster than previous optimization-based methods.
arXiv Detail & Related papers (2023-11-10T18:03:44Z)
GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models [102.22388340738536]
2D and 3D diffusion models can generate decent 3D objects based on prompts. 3D diffusion models have good 3D consistency, but their quality and generalization are limited as trainable 3D data is expensive and hard to obtain. This paper attempts to bridge the power from the two types of diffusion models via the recent explicit and efficient 3D Gaussian splatting representation.
arXiv Detail & Related papers (2023-10-12T17:22:24Z)
Viewset Diffusion: (0-)Image-Conditioned 3D Generative Models from 2D Data [76.38261311948649]
Viewset Diffusion is a diffusion-based generator that outputs 3D objects while only using multi-view 2D data for supervision. We train a diffusion model to generate viewsets, but design the neural network generator to reconstruct internally corresponding 3D models. The model performs reconstruction efficiently, in a feed-forward manner, and is trained using only rendering losses using as few as three views per viewset.
arXiv Detail & Related papers (2023-06-13T16:18:51Z)
Shap-E: Generating Conditional 3D Implicit Functions [7.603750555294962]
Shap-E is a conditional generative model for 3D assets. We train Shap-E in two stages: first, we train an encoder that deterministically maps 3D assets into the parameters of an implicit function. When trained on a large dataset of paired 3D and text data, our resulting models are capable of generating complex and diverse 3D assets in a matter of seconds.
arXiv Detail & Related papers (2023-05-03T23:59:13Z)
Novel View Synthesis with Diffusion Models [56.55571338854636]
We present 3DiM, a diffusion model for 3D novel view synthesis. It is able to translate a single input view into consistent and sharp completions across many views. 3DiM can generate multiple views that are 3D consistent using a novel technique called conditioning.
arXiv Detail & Related papers (2022-10-06T16:59:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.