Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model
- URL: http://arxiv.org/abs/2310.15110v1
- Date: Mon, 23 Oct 2023 17:18:59 GMT
- Title: Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model
- Authors: Ruoxi Shi, Hansheng Chen, Zhuoyang Zhang, Minghua Liu, Chao Xu, Xinyue
Wei, Linghao Chen, Chong Zeng, Hao Su
- Abstract summary: Zero123++ is an image-conditioned diffusion model for generating 3D-consistent multi-view images from a single input view.
We develop various conditioning and training schemes to minimize the effort of finetuning from off-the-shelf image diffusion models.
- Score: 30.44339780026541
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We report Zero123++, an image-conditioned diffusion model for generating
3D-consistent multi-view images from a single input view. To take full
advantage of pretrained 2D generative priors, we develop various conditioning
and training schemes to minimize the effort of finetuning from off-the-shelf
image diffusion models such as Stable Diffusion. Zero123++ excels in producing
high-quality, consistent multi-view images from a single image, overcoming
common issues like texture degradation and geometric misalignment. Furthermore,
we showcase the feasibility of training a ControlNet on Zero123++ for enhanced
control over the generation process. The code is available at
https://github.com/SUDO-AI-3D/zero123plus.
Related papers
- Fancy123: One Image to High-Quality 3D Mesh Generation via Plug-and-Play Deformation [22.5996658181606]
We propose Fancy123, featuring two enhancement modules and an unprojection operation to address the above three issues.
The appearance enhancement module deforms the 2D multiview images to realign pixels for better multiview consistency.
The fidelity enhancement module deforms the 3D mesh to match the input image.
The unprojection of the input image and deformed multiview images onto LRM's generated mesh ensures high clarity.
arXiv Detail & Related papers (2024-11-25T08:31:55Z) - MVGenMaster: Scaling Multi-View Generation from Any Image via 3D Priors Enhanced Diffusion Model [87.71060849866093]
We introduce MVGenMaster, a multi-view diffusion model enhanced with 3D priors to address versatile Novel View Synthesis (NVS) tasks.
Our model features a simple yet effective pipeline that can generate up to 100 novel views conditioned on variable reference views and camera poses.
We present several training and model modifications to strengthen the model with scaled-up datasets.
arXiv Detail & Related papers (2024-11-25T07:34:23Z) - PlacidDreamer: Advancing Harmony in Text-to-3D Generation [20.022078051436846]
PlacidDreamer is a text-to-3D framework that harmonizes multi-view generation and text-conditioned generation.
It employs a novel score distillation algorithm to achieve balanced saturation.
arXiv Detail & Related papers (2024-07-19T02:00:04Z) - Bootstrap3D: Improving Multi-view Diffusion Model with Synthetic Data [80.92268916571712]
A critical bottleneck is the scarcity of high-quality 3D objects with detailed captions.
We propose Bootstrap3D, a novel framework that automatically generates an arbitrary quantity of multi-view images.
We have generated 1 million high-quality synthetic multi-view images with dense descriptive captions.
arXiv Detail & Related papers (2024-05-31T17:59:56Z) - MVDiff: Scalable and Flexible Multi-View Diffusion for 3D Object Reconstruction from Single-View [0.0]
This paper proposes a general framework to generate consistent multi-view images from single image or leveraging scene representation transformer and view-conditioned diffusion model.
Our model is able to generate 3D meshes surpassing baselines methods in evaluation metrics, including PSNR, SSIM and LPIPS.
arXiv Detail & Related papers (2024-05-06T22:55:53Z) - Diffusion Time-step Curriculum for One Image to 3D Generation [91.07638345953016]
Score distillation sampling(SDS) has been widely adopted to overcome the absence of unseen views in reconstructing 3D objects from a textbfsingle image.
We find out the crux is the overlooked indiscriminate treatment of diffusion time-steps during optimization.
We propose the Diffusion Time-step Curriculum one-image-to-3D pipeline (DTC123), which involves both the teacher and student models collaborating with the time-step curriculum in a coarse-to-fine manner.
arXiv Detail & Related papers (2024-04-06T09:03:18Z) - VideoMV: Consistent Multi-View Generation Based on Large Video Generative Model [34.35449902855767]
Two fundamental questions are what data we use for training and how to ensure multi-view consistency.
We propose a dense consistent multi-view generation model that is fine-tuned from off-the-shelf video generative models.
Our approach can generate 24 dense views and converges much faster in training than state-of-the-art approaches.
arXiv Detail & Related papers (2024-03-18T17:48:15Z) - Cascade-Zero123: One Image to Highly Consistent 3D with Self-Prompted Nearby Views [119.76225283008579]
Zero-1-to-3 methods have achieved great success by lifting a 2D latent diffusion model to the 3D scope.
But due to the high sparsity of the single input image, Zero-1-to-3 tends to produce geometry and appearance inconsistency across views.
We propose to supply more condition information for the generation model but in a self-prompt way.
arXiv Detail & Related papers (2023-12-07T16:49:09Z) - ConsistNet: Enforcing 3D Consistency for Multi-view Images Diffusion [61.37481051263816]
Given a single image of a 3D object, this paper proposes a method (named ConsistNet) that is able to generate multiple images of the same object.
Our method effectively learns 3D consistency over a frozen Zero123 backbone and can generate 16 surrounding views of the object within 40 seconds on a single A100 GPU.
arXiv Detail & Related papers (2023-10-16T12:29:29Z) - Control3Diff: Learning Controllable 3D Diffusion Models from Single-view
Images [70.17085345196583]
Control3Diff is a 3D diffusion model that combines the strengths of diffusion models and 3D GANs for versatile, controllable 3D-aware image synthesis.
We validate the efficacy of Control3Diff on standard image generation benchmarks, including FFHQ, AFHQ, and ShapeNet.
arXiv Detail & Related papers (2023-04-13T17:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.