HiFi-123: Towards High-fidelity One Image to 3D Content Generation
- URL: http://arxiv.org/abs/2310.06744v3
- Date: Fri, 12 Jul 2024 01:55:26 GMT
- Title: HiFi-123: Towards High-fidelity One Image to 3D Content Generation
- Authors: Wangbo Yu, Li Yuan, Yan-Pei Cao, Xiangjun Gao, Xiaoyu Li, Wenbo Hu, Long Quan, Ying Shan, Yonghong Tian,
- Abstract summary: HiFi-123 is a method designed for high-fidelity and multi-view consistent 3D generation.
We present a Reference-Guided Novel View Enhancement (RGNV) technique that significantly improves the fidelity of diffusion-based zero-shot novel view synthesis methods.
We also present a novel Reference-Guided State Distillation (RGSD) loss.
- Score: 64.81863143986384
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in diffusion models have enabled 3D generation from a single image. However, current methods often produce suboptimal results for novel views, with blurred textures and deviations from the reference image, limiting their practical applications. In this paper, we introduce HiFi-123, a method designed for high-fidelity and multi-view consistent 3D generation. Our contributions are twofold: First, we propose a Reference-Guided Novel View Enhancement (RGNV) technique that significantly improves the fidelity of diffusion-based zero-shot novel view synthesis methods. Second, capitalizing on the RGNV, we present a novel Reference-Guided State Distillation (RGSD) loss. When incorporated into the optimization-based image-to-3D pipeline, our method significantly improves 3D generation quality, achieving state-of-the-art performance. Comprehensive evaluations demonstrate the effectiveness of our approach over existing methods, both qualitatively and quantitatively. Video results are available on the project page.
Related papers
- ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis [63.169364481672915]
We propose textbfViewCrafter, a novel method for synthesizing high-fidelity novel views of generic scenes from single or sparse images.
Our method takes advantage of the powerful generation capabilities of video diffusion model and the coarse 3D clues offered by point-based representation to generate high-quality video frames.
arXiv Detail & Related papers (2024-09-03T16:53:19Z) - GECO: Generative Image-to-3D within a SECOnd [51.20830808525894]
We introduce GECO, a novel method for high-quality 3D generative modeling that operates within a second.
GECO achieves high-quality image-to-3D mesh generation with an unprecedented level of efficiency.
arXiv Detail & Related papers (2024-05-30T17:58:00Z) - Customize-It-3D: High-Quality 3D Creation from A Single Image Using
Subject-Specific Knowledge Prior [33.45375100074168]
We present a novel two-stage approach that fully utilizes the information provided by the reference image to establish a customized knowledge prior for image-to-3D generation.
Experiments showcase the superiority of our method, Customize-It-3D, outperforming previous works by a substantial margin.
arXiv Detail & Related papers (2023-12-15T19:07:51Z) - RL Dreams: Policy Gradient Optimization for Score Distillation based 3D
Generation [15.154441074606101]
Score Distillation Sampling (SDS) based rendering has improved 3D asset generation to a great extent.
DDPO3D employs the policy gradient method in tandem with aesthetic scoring to improve 3D rendering from 2D diffusion models.
Our approach is compatible with score distillation-based methods, which would facilitate the integration of diverse reward functions into the generative process.
arXiv Detail & Related papers (2023-12-08T02:41:04Z) - Instant3D: Fast Text-to-3D with Sparse-View Generation and Large
Reconstruction Model [68.98311213582949]
We propose Instant3D, a novel method that generates high-quality and diverse 3D assets from text prompts in a feed-forward manner.
Our method can generate diverse 3D assets of high visual quality within 20 seconds, two orders of magnitude faster than previous optimization-based methods.
arXiv Detail & Related papers (2023-11-10T18:03:44Z) - Guide3D: Create 3D Avatars from Text and Image Guidance [55.71306021041785]
Guide3D is a text-and-image-guided generative model for 3D avatar generation based on diffusion models.
Our framework produces topologically and structurally correct geometry and high-resolution textures.
arXiv Detail & Related papers (2023-08-18T17:55:47Z) - HD-Fusion: Detailed Text-to-3D Generation Leveraging Multiple Noise
Estimation [43.83459204345063]
We propose a novel approach that combines multiple noise estimation processes with a pretrained 2D diffusion prior.
Results show that the proposed approach can generate high-quality details compared to the baselines.
arXiv Detail & Related papers (2023-07-30T09:46:22Z) - High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization [51.878078860524795]
We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views.
Our approach enables high-fidelity 3D rendering from a single image, which is promising for various applications of AI-generated 3D content.
arXiv Detail & Related papers (2022-11-28T18:59:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.