Points-to-3D: Bridging the Gap between Sparse Points and
Shape-Controllable Text-to-3D Generation
- URL: http://arxiv.org/abs/2307.13908v1
- Date: Wed, 26 Jul 2023 02:16:55 GMT
- Title: Points-to-3D: Bridging the Gap between Sparse Points and
Shape-Controllable Text-to-3D Generation
- Authors: Chaohui Yu, Qiang Zhou, Jingliang Li, Zhe Zhang, Zhibin Wang, Fan Wang
- Abstract summary: We propose a flexible framework of Points-to-3D to bridge the gap between sparse yet freely available 3D points and realistic shape-controllable 3D generation.
The core idea of Points-to-3D is to introduce controllable sparse 3D points to guide the text-to-3D generation.
- Score: 16.232803881159022
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text-to-3D generation has recently garnered significant attention, fueled by
2D diffusion models trained on billions of image-text pairs. Existing methods
primarily rely on score distillation to leverage the 2D diffusion priors to
supervise the generation of 3D models, e.g., NeRF. However, score distillation
is prone to suffer the view inconsistency problem, and implicit NeRF modeling
can also lead to an arbitrary shape, thus leading to less realistic and
uncontrollable 3D generation. In this work, we propose a flexible framework of
Points-to-3D to bridge the gap between sparse yet freely available 3D points
and realistic shape-controllable 3D generation by distilling the knowledge from
both 2D and 3D diffusion models. The core idea of Points-to-3D is to introduce
controllable sparse 3D points to guide the text-to-3D generation. Specifically,
we use the sparse point cloud generated from the 3D diffusion model, Point-E,
as the geometric prior, conditioned on a single reference image. To better
utilize the sparse 3D points, we propose an efficient point cloud guidance loss
to adaptively drive the NeRF's geometry to align with the shape of the sparse
3D points. In addition to controlling the geometry, we propose to optimize the
NeRF for a more view-consistent appearance. To be specific, we perform score
distillation to the publicly available 2D image diffusion model ControlNet,
conditioned on text as well as depth map of the learned compact geometry.
Qualitative and quantitative comparisons demonstrate that Points-to-3D improves
view consistency and achieves good shape controllability for text-to-3D
generation. Points-to-3D provides users with a new way to improve and control
text-to-3D generation.
Related papers
- Enhancing Single Image to 3D Generation using Gaussian Splatting and Hybrid Diffusion Priors [17.544733016978928]
3D object generation from a single image involves estimating the full 3D geometry and texture of unseen views from an unposed RGB image captured in the wild.
Recent advancements in 3D object generation have introduced techniques that reconstruct an object's 3D shape and texture.
We propose bridging the gap between 2D and 3D diffusion models to address this limitation.
arXiv Detail & Related papers (2024-10-12T10:14:11Z) - Deep Geometric Moments Promote Shape Consistency in Text-to-3D Generation [27.43973967994717]
MT3D is a text-to-3D generative model that leverages a high-fidelity 3D object to overcome viewpoint bias.
We employ depth maps derived from a high-quality 3D model as control signals to guarantee that the generated 2D images preserve the fundamental shape and structure.
By incorporating geometric details from a 3D asset, MT3D enables the creation of diverse and geometrically consistent objects.
arXiv Detail & Related papers (2024-08-12T06:25:44Z) - DIRECT-3D: Learning Direct Text-to-3D Generation on Massive Noisy 3D Data [50.164670363633704]
We present DIRECT-3D, a diffusion-based 3D generative model for creating high-quality 3D assets from text prompts.
Our model is directly trained on extensive noisy and unaligned in-the-wild' 3D assets.
We achieve state-of-the-art performance in both single-class generation and text-to-3D generation.
arXiv Detail & Related papers (2024-06-06T17:58:15Z) - Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior [57.986512832738704]
We present a new framework Sculpt3D that equips the current pipeline with explicit injection of 3D priors from retrieved reference objects without re-training the 2D diffusion model.
Specifically, we demonstrate that high-quality and diverse 3D geometry can be guaranteed by keypoints supervision through a sparse ray sampling approach.
These two decoupled designs effectively harness 3D information from reference objects to generate 3D objects while preserving the generation quality of the 2D diffusion model.
arXiv Detail & Related papers (2024-03-14T07:39:59Z) - IM-3D: Iterative Multiview Diffusion and Reconstruction for High-Quality
3D Generation [96.32684334038278]
In this paper, we explore the design space of text-to-3D models.
We significantly improve multi-view generation by considering video instead of image generators.
Our new method, IM-3D, reduces the number of evaluations of the 2D generator network 10-100x.
arXiv Detail & Related papers (2024-02-13T18:59:51Z) - PI3D: Efficient Text-to-3D Generation with Pseudo-Image Diffusion [18.82883336156591]
We present PI3D, a framework that fully leverages the pre-trained text-to-image diffusion models' ability to generate high-quality 3D shapes from text prompts in minutes.
PI3D generates a single 3D shape from text in only 3 minutes and the quality is validated to outperform existing 3D generative models by a large margin.
arXiv Detail & Related papers (2023-12-14T16:04:34Z) - Magic123: One Image to High-Quality 3D Object Generation Using Both 2D
and 3D Diffusion Priors [104.79392615848109]
We present Magic123, a two-stage coarse-to-fine approach for high-quality, textured 3D meshes from a single unposed image.
In the first stage, we optimize a neural radiance field to produce a coarse geometry.
In the second stage, we adopt a memory-efficient differentiable mesh representation to yield a high-resolution mesh with a visually appealing texture.
arXiv Detail & Related papers (2023-06-30T17:59:08Z) - XDGAN: Multi-Modal 3D Shape Generation in 2D Space [60.46777591995821]
We propose a novel method to convert 3D shapes into compact 1-channel geometry images and leverage StyleGAN3 and image-to-image translation networks to generate 3D objects in 2D space.
The generated geometry images are quick to convert to 3D meshes, enabling real-time 3D object synthesis, visualization and interactive editing.
We show both quantitatively and qualitatively that our method is highly effective at various tasks such as 3D shape generation, single view reconstruction and shape manipulation, while being significantly faster and more flexible compared to recent 3D generative models.
arXiv Detail & Related papers (2022-10-06T15:54:01Z) - Unsupervised Learning of Fine Structure Generation for 3D Point Clouds
by 2D Projection Matching [66.98712589559028]
We propose an unsupervised approach for 3D point cloud generation with fine structures.
Our method can recover fine 3D structures from 2D silhouette images at different resolutions.
arXiv Detail & Related papers (2021-08-08T22:15:31Z) - Learning geometry-image representation for 3D point cloud generation [5.3485743892868545]
We propose a novel geometry image based generator (GIG) to convert the 3D point cloud generation problem to a 2D geometry image generation problem.
Experiments on both rigid and non-rigid 3D object datasets have demonstrated the promising performance of our method.
arXiv Detail & Related papers (2020-11-29T05:21:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.