EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion
- URL: http://arxiv.org/abs/2312.06725v3
- Date: Tue, 2 Apr 2024 09:18:36 GMT
- Title: EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion
- Authors: Zehuan Huang, Hao Wen, Junting Dong, Yaohui Wang, Yangguang Li, Xinyuan Chen, Yan-Pei Cao, Ding Liang, Yu Qiao, Bo Dai, Lu Sheng,
- Abstract summary: EpiDiff is a localized interactive multiview diffusion model.
It generates 16 multiview images in just 12 seconds.
It surpasses previous methods in quality evaluation metrics.
- Score: 60.30030562932703
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generating multiview images from a single view facilitates the rapid generation of a 3D mesh conditioned on a single image. Recent methods that introduce 3D global representation into diffusion models have shown the potential to generate consistent multiviews, but they have reduced generation speed and face challenges in maintaining generalizability and quality. To address this issue, we propose EpiDiff, a localized interactive multiview diffusion model. At the core of the proposed approach is to insert a lightweight epipolar attention block into the frozen diffusion model, leveraging epipolar constraints to enable cross-view interaction among feature maps of neighboring views. The newly initialized 3D modeling module preserves the original feature distribution of the diffusion model, exhibiting compatibility with a variety of base diffusion models. Experiments show that EpiDiff generates 16 multiview images in just 12 seconds, and it surpasses previous methods in quality evaluation metrics, including PSNR, SSIM and LPIPS. Additionally, EpiDiff can generate a more diverse distribution of views, improving the reconstruction quality from generated multiviews. Please see our project page at https://huanngzh.github.io/EpiDiff/.
Related papers
- CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation [59.257513664564996]
We introduce a novel method for generating 360deg panoramas from text prompts or images.
We employ multi-view diffusion models to jointly synthesize the six faces of a cubemap.
Our model allows for fine-grained text control, generates high resolution panorama images and generalizes well beyond its training set.
arXiv Detail & Related papers (2025-01-28T18:59:49Z) - Dual Diffusion for Unified Image Generation and Understanding [32.7554623473768]
We propose a large-scale and fully end-to-end diffusion model for multi-modal understanding and generation.
We leverage a cross-modal maximum likelihood estimation framework that simultaneously trains the conditional likelihoods of both images and text jointly.
Our model attained competitive performance compared to recent unified image understanding and generation models.
arXiv Detail & Related papers (2024-12-31T05:49:00Z) - 3DEnhancer: Consistent Multi-View Diffusion for 3D Enhancement [66.8116563135326]
We present 3DEnhancer, which employs a multi-view latent diffusion model to enhance coarse 3D inputs while preserving multi-view consistency.
Unlike existing video-based approaches, our model supports seamless multi-view enhancement with improved coherence across diverse viewing angles.
arXiv Detail & Related papers (2024-12-24T17:36:34Z) - PlacidDreamer: Advancing Harmony in Text-to-3D Generation [20.022078051436846]
PlacidDreamer is a text-to-3D framework that harmonizes multi-view generation and text-conditioned generation.
It employs a novel score distillation algorithm to achieve balanced saturation.
arXiv Detail & Related papers (2024-07-19T02:00:04Z) - MultiDiff: Consistent Novel View Synthesis from a Single Image [60.04215655745264]
MultiDiff is a novel approach for consistent novel view synthesis of scenes from a single RGB image.
Our results demonstrate that MultiDiff outperforms state-of-the-art methods on the challenging, real-world datasets RealEstate10K and ScanNet.
arXiv Detail & Related papers (2024-06-26T17:53:51Z) - Vivid-ZOO: Multi-View Video Generation with Diffusion Model [76.96449336578286]
New challenges lie in the lack of massive captioned multi-view videos and the complexity of modeling such multi-dimensional distribution.
We propose a novel diffusion-based pipeline that generates high-quality multi-view videos centered around a dynamic 3D object from text.
arXiv Detail & Related papers (2024-06-12T21:44:04Z) - MVDiff: Scalable and Flexible Multi-View Diffusion for 3D Object Reconstruction from Single-View [0.0]
This paper proposes a general framework to generate consistent multi-view images from single image or leveraging scene representation transformer and view-conditioned diffusion model.
Our model is able to generate 3D meshes surpassing baselines methods in evaluation metrics, including PSNR, SSIM and LPIPS.
arXiv Detail & Related papers (2024-05-06T22:55:53Z) - Deceptive-NeRF/3DGS: Diffusion-Generated Pseudo-Observations for High-Quality Sparse-View Reconstruction [60.52716381465063]
We introduce Deceptive-NeRF/3DGS to enhance sparse-view reconstruction with only a limited set of input images.
Specifically, we propose a deceptive diffusion model turning noisy images rendered from few-view reconstructions into high-quality pseudo-observations.
Our system progressively incorporates diffusion-generated pseudo-observations into the training image sets, ultimately densifying the sparse input observations by 5 to 10 times.
arXiv Detail & Related papers (2023-05-24T14:00:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.