Consistent123: Improve Consistency for One Image to 3D Object Synthesis
- URL: http://arxiv.org/abs/2310.08092v1
- Date: Thu, 12 Oct 2023 07:38:28 GMT
- Title: Consistent123: Improve Consistency for One Image to 3D Object Synthesis
- Authors: Haohan Weng, Tianyu Yang, Jianan Wang, Yu Li, Tong Zhang, C. L. Philip
Chen, Lei Zhang
- Abstract summary: Large image diffusion models enable novel view synthesis with high quality and excellent zero-shot capability.
These models have no guarantee of view consistency, limiting the performance for downstream tasks like 3D reconstruction and image-to-3D generation.
We propose Consistent123 to synthesize novel views simultaneously by incorporating additional cross-view attention layers and the shared self-attention mechanism.
- Score: 74.1094516222327
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large image diffusion models enable novel view synthesis with high quality
and excellent zero-shot capability. However, such models based on
image-to-image translation have no guarantee of view consistency, limiting the
performance for downstream tasks like 3D reconstruction and image-to-3D
generation. To empower consistency, we propose Consistent123 to synthesize
novel views simultaneously by incorporating additional cross-view attention
layers and the shared self-attention mechanism. The proposed attention
mechanism improves the interaction across all synthesized views, as well as the
alignment between the condition view and novel views. In the sampling stage,
such architecture supports simultaneously generating an arbitrary number of
views while training at a fixed length. We also introduce a progressive
classifier-free guidance strategy to achieve the trade-off between texture and
geometry for synthesized object views. Qualitative and quantitative experiments
show that Consistent123 outperforms baselines in view consistency by a large
margin. Furthermore, we demonstrate a significant improvement of Consistent123
on varying downstream tasks, showing its great potential in the 3D generation
field. The project page is available at consistent-123.github.io.
Related papers
- ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis [63.169364481672915]
We propose textbfViewCrafter, a novel method for synthesizing high-fidelity novel views of generic scenes from single or sparse images.
Our method takes advantage of the powerful generation capabilities of video diffusion model and the coarse 3D clues offered by point-based representation to generate high-quality video frames.
arXiv Detail & Related papers (2024-09-03T16:53:19Z) - Focus on Neighbors and Know the Whole: Towards Consistent Dense Multiview Text-to-Image Generator for 3D Creation [64.07560335451723]
CoSER is a novel consistent dense Multiview Text-to-Image Generator for Text-to-3D.
It achieves both efficiency and quality by meticulously learning neighbor-view coherence.
It aggregates information along motion paths explicitly defined by physical principles to refine details.
arXiv Detail & Related papers (2024-08-23T15:16:01Z) - MultiDiff: Consistent Novel View Synthesis from a Single Image [60.04215655745264]
MultiDiff is a novel approach for consistent novel view synthesis of scenes from a single RGB image.
Our results demonstrate that MultiDiff outperforms state-of-the-art methods on the challenging, real-world datasets RealEstate10K and ScanNet.
arXiv Detail & Related papers (2024-06-26T17:53:51Z) - Zero-to-Hero: Enhancing Zero-Shot Novel View Synthesis via Attention Map Filtering [16.382098950820822]
We propose Zero-to-Hero, a novel test-time approach that enhances view synthesis by manipulating attention maps.
We modify the self-attention mechanism to integrate information from the source view, reducing shape distortions.
Results demonstrate substantial improvements in fidelity and consistency, validated on a diverse set of out-of-distribution objects.
arXiv Detail & Related papers (2024-05-29T00:58:22Z) - Consistent-1-to-3: Consistent Image to 3D View Synthesis via Geometry-aware Diffusion Models [16.326276673056334]
Consistent-1-to-3 is a generative framework that significantly mitigates this issue.
We decompose the NVS task into two stages: (i) transforming observed regions to a novel view, and (ii) hallucinating unseen regions.
We propose to employ epipolor-guided attention to incorporate geometry constraints, and multi-view attention to better aggregate multi-view information.
arXiv Detail & Related papers (2023-10-04T17:58:57Z) - SyncDreamer: Generating Multiview-consistent Images from a Single-view Image [59.75474518708409]
A novel diffusion model called SyncDreamer generates multiview-consistent images from a single-view image.
Experiments show that SyncDreamer generates images with high consistency across different views.
arXiv Detail & Related papers (2023-09-07T02:28:04Z) - Novel View Synthesis with Diffusion Models [56.55571338854636]
We present 3DiM, a diffusion model for 3D novel view synthesis.
It is able to translate a single input view into consistent and sharp completions across many views.
3DiM can generate multiple views that are 3D consistent using a novel technique called conditioning.
arXiv Detail & Related papers (2022-10-06T16:59:56Z) - Multi-View Consistent Generative Adversarial Networks for 3D-aware Image
Synthesis [48.33860286920389]
3D-aware image synthesis aims to generate images of objects from multiple views by learning a 3D representation.
Existing approaches lack geometry constraints, hence usually fail to generate multi-view consistent images.
We propose Multi-View Consistent Generative Adrial Networks (MVCGAN) for high-quality 3D-aware image synthesis with geometry constraints.
arXiv Detail & Related papers (2022-04-13T11:23:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.