Envision3D: One Image to 3D with Anchor Views Interpolation
- URL: http://arxiv.org/abs/2403.08902v1
- Date: Wed, 13 Mar 2024 18:46:33 GMT
- Title: Envision3D: One Image to 3D with Anchor Views Interpolation
- Authors: Yatian Pang, Tanghui Jia, Yujun Shi, Zhenyu Tang, Junwu Zhang, Xinhua Cheng, Xing Zhou, Francis E. H. Tay, Li Yuan,
- Abstract summary: We present Envision3D, a novel method for efficiently generating high-quality 3D content from a single image.
It is capable of generating high-quality 3D content in terms of texture and geometry, surpassing previous image-to-3D baseline methods.
- Score: 18.31796952040799
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present Envision3D, a novel method for efficiently generating high-quality 3D content from a single image. Recent methods that extract 3D content from multi-view images generated by diffusion models show great potential. However, it is still challenging for diffusion models to generate dense multi-view consistent images, which is crucial for the quality of 3D content extraction. To address this issue, we propose a novel cascade diffusion framework, which decomposes the challenging dense views generation task into two tractable stages, namely anchor views generation and anchor views interpolation. In the first stage, we train the image diffusion model to generate global consistent anchor views conditioning on image-normal pairs. Subsequently, leveraging our video diffusion model fine-tuned on consecutive multi-view images, we conduct interpolation on the previous anchor views to generate extra dense views. This framework yields dense, multi-view consistent images, providing comprehensive 3D information. To further enhance the overall generation quality, we introduce a coarse-to-fine sampling strategy for the reconstruction algorithm to robustly extract textured meshes from the generated dense images. Extensive experiments demonstrate that our method is capable of generating high-quality 3D content in terms of texture and geometry, surpassing previous image-to-3D baseline methods.
Related papers
- PlacidDreamer: Advancing Harmony in Text-to-3D Generation [20.022078051436846]
PlacidDreamer is a text-to-3D framework that harmonizes multi-view generation and text-conditioned generation.
It employs a novel score distillation algorithm to achieve balanced saturation.
arXiv Detail & Related papers (2024-07-19T02:00:04Z) - Bootstrap3D: Improving 3D Content Creation with Synthetic Data [80.92268916571712]
A critical bottleneck is the scarcity of high-quality 3D assets with detailed captions.
We propose Bootstrap3D, a novel framework that automatically generates an arbitrary quantity of multi-view images.
We have generated 1 million high-quality synthetic multi-view images with dense descriptive captions.
arXiv Detail & Related papers (2024-05-31T17:59:56Z) - Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image [28.759158325097093]
Unique3D is a novel image-to-3D framework for efficiently generating high-quality 3D meshes from single-view images.
Our framework features state-of-the-art generation fidelity and strong generalizability.
arXiv Detail & Related papers (2024-05-30T17:59:54Z) - MVDiff: Scalable and Flexible Multi-View Diffusion for 3D Object Reconstruction from Single-View [0.0]
This paper proposes a general framework to generate consistent multi-view images from single image or leveraging scene representation transformer and view-conditioned diffusion model.
Our model is able to generate 3D meshes surpassing baselines methods in evaluation metrics, including PSNR, SSIM and LPIPS.
arXiv Detail & Related papers (2024-05-06T22:55:53Z) - Grounded Compositional and Diverse Text-to-3D with Pretrained Multi-View Diffusion Model [65.58911408026748]
We propose Grounded-Dreamer to generate 3D assets that can accurately follow complex, compositional text prompts.
We first advocate leveraging text-guided 4-view images as the bottleneck in the text-to-3D pipeline.
We then introduce an attention refocusing mechanism to encourage text-aligned 4-view image generation.
arXiv Detail & Related papers (2024-04-28T04:05:10Z) - VideoMV: Consistent Multi-View Generation Based on Large Video Generative Model [34.35449902855767]
Two fundamental questions are what data we use for training and how to ensure multi-view consistency.
We propose a dense consistent multi-view generation model that is fine-tuned from off-the-shelf video generative models.
Our approach can generate 24 dense views and converges much faster in training than state-of-the-art approaches.
arXiv Detail & Related papers (2024-03-18T17:48:15Z) - Wonder3D: Single Image to 3D using Cross-Domain Diffusion [105.16622018766236]
Wonder3D is a novel method for efficiently generating high-fidelity textured meshes from single-view images.
To holistically improve the quality, consistency, and efficiency of image-to-3D tasks, we propose a cross-domain diffusion model.
arXiv Detail & Related papers (2023-10-23T15:02:23Z) - EfficientDreamer: High-Fidelity and Robust 3D Creation via Orthogonal-view Diffusion Prior [59.25950280610409]
We propose a robust high-quality 3D content generation pipeline by exploiting orthogonal-view image guidance.
In this paper, we introduce a novel 2D diffusion model that generates an image consisting of four sub-images based on the given text prompt.
We also present a 3D synthesis network that can further improve the details of the generated 3D contents.
arXiv Detail & Related papers (2023-08-25T07:39:26Z) - Guide3D: Create 3D Avatars from Text and Image Guidance [55.71306021041785]
Guide3D is a text-and-image-guided generative model for 3D avatar generation based on diffusion models.
Our framework produces topologically and structurally correct geometry and high-resolution textures.
arXiv Detail & Related papers (2023-08-18T17:55:47Z) - High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization [51.878078860524795]
We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views.
Our approach enables high-fidelity 3D rendering from a single image, which is promising for various applications of AI-generated 3D content.
arXiv Detail & Related papers (2022-11-28T18:59:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.