Related papers: EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion

EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion

URL: http://arxiv.org/abs/2312.06725v3
Date: Tue, 2 Apr 2024 09:18:36 GMT
Title: EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion
Authors: Zehuan Huang, Hao Wen, Junting Dong, Yaohui Wang, Yangguang Li, Xinyuan Chen, Yan-Pei Cao, Ding Liang, Yu Qiao, Bo Dai, Lu Sheng,
Abstract summary: EpiDiff is a localized interactive multiview diffusion model. It generates 16 multiview images in just 12 seconds. It surpasses previous methods in quality evaluation metrics.
Score: 60.30030562932703
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Generating multiview images from a single view facilitates the rapid generation of a 3D mesh conditioned on a single image. Recent methods that introduce 3D global representation into diffusion models have shown the potential to generate consistent multiviews, but they have reduced generation speed and face challenges in maintaining generalizability and quality. To address this issue, we propose EpiDiff, a localized interactive multiview diffusion model. At the core of the proposed approach is to insert a lightweight epipolar attention block into the frozen diffusion model, leveraging epipolar constraints to enable cross-view interaction among feature maps of neighboring views. The newly initialized 3D modeling module preserves the original feature distribution of the diffusion model, exhibiting compatibility with a variety of base diffusion models. Experiments show that EpiDiff generates 16 multiview images in just 12 seconds, and it surpasses previous methods in quality evaluation metrics, including PSNR, SSIM and LPIPS. Additionally, EpiDiff can generate a more diverse distribution of views, improving the reconstruction quality from generated multiviews. Please see our project page at https://huanngzh.github.io/EpiDiff/.

Related papers

ViewMask-1-to-3: Multi-View Consistent Image Generation via Multimodal Diffusion Models [70.28556518166037]
We introduce ViewMask-1-to-3, a pioneering approach to apply discrete diffusion models to multi-view image generation.<n>By unifying language and vision through masked token prediction, our approach enables progressive generation of multiple viewpoints.<n>Our approach ranks first on average across GSO and 3D-FUTURE datasets in terms of PSNR, SSIM, and LPIPS.
arXiv Detail & Related papers (2025-12-16T05:15:07Z)
FROMAT: Multiview Material Appearance Transfer via Few-Shot Self-Attention Adaptation [49.74776147964999]
We present a lightweight adaptation technique for appearance transfer in multiview diffusion models.<n>Our method learns to combine object identity from an input image with appearance cues rendered in a separate reference image, producing multi-view-consistent output.
arXiv Detail & Related papers (2025-12-10T13:06:40Z)
MEAT: Multiview Diffusion Model for Human Generation on Megapixels with Mesh Attention [83.56588173102594]
We introduce a solution called mesh attention to enable training at 1024x1024 resolution. This approach significantly reduces the complexity of multiview attention while maintaining cross-view consistency. Building on this foundation, we devise a mesh attention block and combine it with keypoint conditioning to create our human-specific multiview diffusion model, MEAT.
arXiv Detail & Related papers (2025-03-11T17:50:59Z)
3DEnhancer: Consistent Multi-View Diffusion for 3D Enhancement [66.8116563135326]
We present 3DEnhancer, which employs a multi-view latent diffusion model to enhance coarse 3D inputs while preserving multi-view consistency. Unlike existing video-based approaches, our model supports seamless multi-view enhancement with improved coherence across diverse viewing angles.
arXiv Detail & Related papers (2024-12-24T17:36:34Z)
Towards High-Fidelity 3D Portrait Generation with Rich Details by Cross-View Prior-Aware Diffusion [63.81544586407943]
Single-image 3D portrait generation methods typically employ 2D diffusion models to provide multi-view knowledge, which is then distilled into 3D representations. We propose a Hybrid Priors Diffsion model, which explicitly and implicitly incorporates multi-view priors as conditions to enhance the status consistency of the generated multi-view portraits. Experiments demonstrate that our method can produce 3D portraits with accurate geometry and rich details from a single image.
arXiv Detail & Related papers (2024-11-15T17:19:18Z)
PlacidDreamer: Advancing Harmony in Text-to-3D Generation [20.022078051436846]
PlacidDreamer is a text-to-3D framework that harmonizes multi-view generation and text-conditioned generation. It employs a novel score distillation algorithm to achieve balanced saturation.
arXiv Detail & Related papers (2024-07-19T02:00:04Z)
MultiDiff: Consistent Novel View Synthesis from a Single Image [60.04215655745264]
MultiDiff is a novel approach for consistent novel view synthesis of scenes from a single RGB image. Our results demonstrate that MultiDiff outperforms state-of-the-art methods on the challenging, real-world datasets RealEstate10K and ScanNet.
arXiv Detail & Related papers (2024-06-26T17:53:51Z)
Vivid-ZOO: Multi-View Video Generation with Diffusion Model [76.96449336578286]
New challenges lie in the lack of massive captioned multi-view videos and the complexity of modeling such multi-dimensional distribution. We propose a novel diffusion-based pipeline that generates high-quality multi-view videos centered around a dynamic 3D object from text.
arXiv Detail & Related papers (2024-06-12T21:44:04Z)
MVDiff: Scalable and Flexible Multi-View Diffusion for 3D Object Reconstruction from Single-View [0.0]
This paper proposes a general framework to generate consistent multi-view images from single image or leveraging scene representation transformer and view-conditioned diffusion model. Our model is able to generate 3D meshes surpassing baselines methods in evaluation metrics, including PSNR, SSIM and LPIPS.
arXiv Detail & Related papers (2024-05-06T22:55:53Z)
Diffusion$^2$: Dynamic 3D Content Generation via Score Composition of Video and Multi-view Diffusion Models [6.738732514502613]
Diffusion$2$ is a novel framework for dynamic 3D content creation. It reconciles the knowledge about geometric consistency and temporal smoothness from 3D models to directly sample dense multi-view images. Experiments demonstrate the efficacy of our proposed framework in generating highly seamless and consistent 4D assets.
arXiv Detail & Related papers (2024-04-02T17:58:03Z)
Deceptive-NeRF/3DGS: Diffusion-Generated Pseudo-Observations for High-Quality Sparse-View Reconstruction [60.52716381465063]
We introduce Deceptive-NeRF/3DGS to enhance sparse-view reconstruction with only a limited set of input images. Specifically, we propose a deceptive diffusion model turning noisy images rendered from few-view reconstructions into high-quality pseudo-observations. Our system progressively incorporates diffusion-generated pseudo-observations into the training image sets, ultimately densifying the sparse input observations by 5 to 10 times.
arXiv Detail & Related papers (2023-05-24T14:00:32Z)
Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction [77.69363640021503]
3D-aware image synthesis encompasses a variety of tasks, such as scene generation and novel view synthesis from images. We present SSDNeRF, a unified approach that employs an expressive diffusion model to learn a generalizable prior of neural radiance fields (NeRF) from multi-view images of diverse objects.
arXiv Detail & Related papers (2023-04-13T17:59:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.