MVDiffusion: Enabling Holistic Multi-view Image Generation with
Correspondence-Aware Diffusion
- URL: http://arxiv.org/abs/2307.01097v7
- Date: Mon, 25 Dec 2023 04:32:26 GMT
- Title: MVDiffusion: Enabling Holistic Multi-view Image Generation with
Correspondence-Aware Diffusion
- Authors: Shitao Tang, Fuyang Zhang, Jiacheng Chen, Peng Wang, Yasutaka Furukawa
- Abstract summary: This paper introduces MVDiffusion, a simple yet effective method for generating consistent multiview images.
MVDiffusion simultaneously generates all images with a global, effectively addressing the prevalent error accumulation.
- Score: 26.582847694092884
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper introduces MVDiffusion, a simple yet effective method for
generating consistent multi-view images from text prompts given pixel-to-pixel
correspondences (e.g., perspective crops from a panorama or multi-view images
given depth maps and poses). Unlike prior methods that rely on iterative image
warping and inpainting, MVDiffusion simultaneously generates all images with a
global awareness, effectively addressing the prevalent error accumulation
issue. At its core, MVDiffusion processes perspective images in parallel with a
pre-trained text-to-image diffusion model, while integrating novel
correspondence-aware attention layers to facilitate cross-view interactions.
For panorama generation, while only trained with 10k panoramas, MVDiffusion is
able to generate high-resolution photorealistic images for arbitrary texts or
extrapolate one perspective image to a 360-degree view. For multi-view
depth-to-image generation, MVDiffusion demonstrates state-of-the-art
performance for texturing a scene mesh.
Related papers
- A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding [76.44979557843367]
We propose a novel multi-view stereo (MVS) framework that gets rid of the depth range prior.
We introduce a Multi-view Disparity Attention (MDA) module to aggregate long-range context information.
We explicitly estimate the quality of the current pixel corresponding to sampled points on the epipolar line of the source image.
arXiv Detail & Related papers (2024-11-04T08:50:16Z) - From Bird's-Eye to Street View: Crafting Diverse and Condition-Aligned Images with Latent Diffusion Model [16.716345249091408]
We explore Bird's-Eye View generation, converting a BEV map into its corresponding multi-view street images.
Our approach comprises two main components: the Neural View Transformation and the Street Image Generation.
arXiv Detail & Related papers (2024-09-02T07:47:16Z) - Pixel-Aligned Multi-View Generation with Depth Guided Decoder [86.1813201212539]
We propose a novel method for pixel-level image-to-multi-view generation.
Unlike prior work, we incorporate attention layers across multi-view images in the VAE decoder of a latent video diffusion model.
Our model enables better pixel alignment across multi-view images.
arXiv Detail & Related papers (2024-08-26T04:56:41Z) - Vivid-ZOO: Multi-View Video Generation with Diffusion Model [76.96449336578286]
New challenges lie in the lack of massive captioned multi-view videos and the complexity of modeling such multi-dimensional distribution.
We propose a novel diffusion-based pipeline that generates high-quality multi-view videos centered around a dynamic 3D object from text.
arXiv Detail & Related papers (2024-06-12T21:44:04Z) - Era3D: High-Resolution Multiview Diffusion using Efficient Row-wise Attention [87.02613021058484]
We introduce Era3D, a novel multiview diffusion method that generates high-resolution multiview images from a single-view image.
Era3D generates high-quality multiview images with up to a 512*512 resolution while reducing complexity by 12x times.
arXiv Detail & Related papers (2024-05-19T17:13:16Z) - Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object
Structure via HyperNetworks [53.67497327319569]
We introduce a novel neural rendering technique to solve image-to-3D from a single view.
Our approach employs the signed distance function as the surface representation and incorporates generalizable priors through geometry-encoding volumes and HyperNetworks.
Our experiments show the advantages of our proposed approach with consistent results and rapid generation.
arXiv Detail & Related papers (2023-12-24T08:42:37Z) - SinMPI: Novel View Synthesis from a Single Image with Expanded
Multiplane Images [22.902506592749816]
This paper proposes SinMPI, a novel method that uses an expanded multiplane image (MPI) as the 3D scene representation.
The key idea of our method is to use Stable Diffusion to generate out-of-view contents.
Both qualitative and quantitative experiments have been conducted to validate the superiority of our method to the state of the art.
arXiv Detail & Related papers (2023-12-18T09:16:30Z) - EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion [60.30030562932703]
EpiDiff is a localized interactive multiview diffusion model.
It generates 16 multiview images in just 12 seconds.
It surpasses previous methods in quality evaluation metrics.
arXiv Detail & Related papers (2023-12-11T05:20:52Z) - Bridging the Visual Gap: Wide-Range Image Blending [16.464837892640812]
We introduce an effective deep-learning model to realize wide-range image blending.
We experimentally demonstrate that our proposed method is able to produce visually appealing results.
arXiv Detail & Related papers (2021-03-28T15:07:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.