Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model
- URL: http://arxiv.org/abs/2310.15110v1
- Date: Mon, 23 Oct 2023 17:18:59 GMT
- Title: Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model
- Authors: Ruoxi Shi, Hansheng Chen, Zhuoyang Zhang, Minghua Liu, Chao Xu, Xinyue
Wei, Linghao Chen, Chong Zeng, Hao Su
- Abstract summary: Zero123++ is an image-conditioned diffusion model for generating 3D-consistent multi-view images from a single input view.
We develop various conditioning and training schemes to minimize the effort of finetuning from off-the-shelf image diffusion models.
- Score: 30.44339780026541
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We report Zero123++, an image-conditioned diffusion model for generating
3D-consistent multi-view images from a single input view. To take full
advantage of pretrained 2D generative priors, we develop various conditioning
and training schemes to minimize the effort of finetuning from off-the-shelf
image diffusion models such as Stable Diffusion. Zero123++ excels in producing
high-quality, consistent multi-view images from a single image, overcoming
common issues like texture degradation and geometric misalignment. Furthermore,
we showcase the feasibility of training a ControlNet on Zero123++ for enhanced
control over the generation process. The code is available at
https://github.com/SUDO-AI-3D/zero123plus.
Related papers
- PlacidDreamer: Advancing Harmony in Text-to-3D Generation [20.022078051436846]
PlacidDreamer is a text-to-3D framework that harmonizes multi-view generation and text-conditioned generation.
It employs a novel score distillation algorithm to achieve balanced saturation.
arXiv Detail & Related papers (2024-07-19T02:00:04Z) - Bootstrap3D: Improving 3D Content Creation with Synthetic Data [80.92268916571712]
A critical bottleneck is the scarcity of high-quality 3D assets with detailed captions.
We propose Bootstrap3D, a novel framework that automatically generates an arbitrary quantity of multi-view images.
We have generated 1 million high-quality synthetic multi-view images with dense descriptive captions.
arXiv Detail & Related papers (2024-05-31T17:59:56Z) - TexPainter: Generative Mesh Texturing with Multi-view Consistency [20.366302413005734]
In this paper, we propose a novel method to enforce multi-view consistency.
We use an optimization-based color-fusion to enforce consistency and indirectly modify the latent codes by gradient back-propagation.
Our method improves consistency and overall quality of the generated textures as compared to competing state-of-the-arts.
arXiv Detail & Related papers (2024-05-17T18:41:36Z) - MVDiff: Scalable and Flexible Multi-View Diffusion for 3D Object Reconstruction from Single-View [0.0]
This paper proposes a general framework to generate consistent multi-view images from single image or leveraging scene representation transformer and view-conditioned diffusion model.
Our model is able to generate 3D meshes surpassing baselines methods in evaluation metrics, including PSNR, SSIM and LPIPS.
arXiv Detail & Related papers (2024-05-06T22:55:53Z) - Diffusion Time-step Curriculum for One Image to 3D Generation [91.07638345953016]
Score distillation sampling(SDS) has been widely adopted to overcome the absence of unseen views in reconstructing 3D objects from a textbfsingle image.
We find out the crux is the overlooked indiscriminate treatment of diffusion time-steps during optimization.
We propose the Diffusion Time-step Curriculum one-image-to-3D pipeline (DTC123), which involves both the teacher and student models collaborating with the time-step curriculum in a coarse-to-fine manner.
arXiv Detail & Related papers (2024-04-06T09:03:18Z) - VideoMV: Consistent Multi-View Generation Based on Large Video Generative Model [34.35449902855767]
Two fundamental questions are what data we use for training and how to ensure multi-view consistency.
We propose a dense consistent multi-view generation model that is fine-tuned from off-the-shelf video generative models.
Our approach can generate 24 dense views and converges much faster in training than state-of-the-art approaches.
arXiv Detail & Related papers (2024-03-18T17:48:15Z) - Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object
Structure via HyperNetworks [53.67497327319569]
We introduce a novel neural rendering technique to solve image-to-3D from a single view.
Our approach employs the signed distance function as the surface representation and incorporates generalizable priors through geometry-encoding volumes and HyperNetworks.
Our experiments show the advantages of our proposed approach with consistent results and rapid generation.
arXiv Detail & Related papers (2023-12-24T08:42:37Z) - ConsistNet: Enforcing 3D Consistency for Multi-view Images Diffusion [61.37481051263816]
Given a single image of a 3D object, this paper proposes a method (named ConsistNet) that is able to generate multiple images of the same object.
Our method effectively learns 3D consistency over a frozen Zero123 backbone and can generate 16 surrounding views of the object within 40 seconds on a single A100 GPU.
arXiv Detail & Related papers (2023-10-16T12:29:29Z) - Control3Diff: Learning Controllable 3D Diffusion Models from Single-view
Images [70.17085345196583]
Control3Diff is a 3D diffusion model that combines the strengths of diffusion models and 3D GANs for versatile, controllable 3D-aware image synthesis.
We validate the efficacy of Control3Diff on standard image generation benchmarks, including FFHQ, AFHQ, and ShapeNet.
arXiv Detail & Related papers (2023-04-13T17:52:29Z) - Novel View Synthesis with Diffusion Models [56.55571338854636]
We present 3DiM, a diffusion model for 3D novel view synthesis.
It is able to translate a single input view into consistent and sharp completions across many views.
3DiM can generate multiple views that are 3D consistent using a novel technique called conditioning.
arXiv Detail & Related papers (2022-10-06T16:59:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.