MVDiffusion: Enabling Holistic Multi-view Image Generation with
Correspondence-Aware Diffusion
- URL: http://arxiv.org/abs/2307.01097v7
- Date: Mon, 25 Dec 2023 04:32:26 GMT
- Title: MVDiffusion: Enabling Holistic Multi-view Image Generation with
Correspondence-Aware Diffusion
- Authors: Shitao Tang, Fuyang Zhang, Jiacheng Chen, Peng Wang, Yasutaka Furukawa
- Abstract summary: This paper introduces MVDiffusion, a simple yet effective method for generating consistent multiview images.
MVDiffusion simultaneously generates all images with a global, effectively addressing the prevalent error accumulation.
- Score: 26.582847694092884
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper introduces MVDiffusion, a simple yet effective method for
generating consistent multi-view images from text prompts given pixel-to-pixel
correspondences (e.g., perspective crops from a panorama or multi-view images
given depth maps and poses). Unlike prior methods that rely on iterative image
warping and inpainting, MVDiffusion simultaneously generates all images with a
global awareness, effectively addressing the prevalent error accumulation
issue. At its core, MVDiffusion processes perspective images in parallel with a
pre-trained text-to-image diffusion model, while integrating novel
correspondence-aware attention layers to facilitate cross-view interactions.
For panorama generation, while only trained with 10k panoramas, MVDiffusion is
able to generate high-resolution photorealistic images for arbitrary texts or
extrapolate one perspective image to a 360-degree view. For multi-view
depth-to-image generation, MVDiffusion demonstrates state-of-the-art
performance for texturing a scene mesh.
Related papers
- Zero-Shot Novel View and Depth Synthesis with Multi-View Geometric Diffusion [27.836518920611557]
We introduce MVGD, a diffusion-based architecture capable of direct pixel-level generation of images and depth maps from novel viewpoints.
We train this model on a collection of more than 60 million multi-view samples from publicly available datasets.
We report state-of-the-art results in multiple novel view synthesis benchmarks, as well as multi-view stereo and video depth estimation.
arXiv Detail & Related papers (2025-01-30T23:43:06Z) - CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation [59.257513664564996]
We introduce a novel method for generating 360deg panoramas from text prompts or images.
We employ multi-view diffusion models to jointly synthesize the six faces of a cubemap.
Our model allows for fine-grained text control, generates high resolution panorama images and generalizes well beyond its training set.
arXiv Detail & Related papers (2025-01-28T18:59:49Z) - A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding [76.44979557843367]
We propose a novel multi-view stereo (MVS) framework that gets rid of the depth range prior.
We introduce a Multi-view Disparity Attention (MDA) module to aggregate long-range context information.
We explicitly estimate the quality of the current pixel corresponding to sampled points on the epipolar line of the source image.
arXiv Detail & Related papers (2024-11-04T08:50:16Z) - Pixel-Aligned Multi-View Generation with Depth Guided Decoder [86.1813201212539]
We propose a novel method for pixel-level image-to-multi-view generation.
Unlike prior work, we incorporate attention layers across multi-view images in the VAE decoder of a latent video diffusion model.
Our model enables better pixel alignment across multi-view images.
arXiv Detail & Related papers (2024-08-26T04:56:41Z) - Era3D: High-Resolution Multiview Diffusion using Efficient Row-wise Attention [87.02613021058484]
We introduce Era3D, a novel multiview diffusion method that generates high-resolution multiview images from a single-view image.
Era3D generates high-quality multiview images with up to a 512*512 resolution while reducing complexity by 12x times.
arXiv Detail & Related papers (2024-05-19T17:13:16Z) - Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object
Structure via HyperNetworks [53.67497327319569]
We introduce a novel neural rendering technique to solve image-to-3D from a single view.
Our approach employs the signed distance function as the surface representation and incorporates generalizable priors through geometry-encoding volumes and HyperNetworks.
Our experiments show the advantages of our proposed approach with consistent results and rapid generation.
arXiv Detail & Related papers (2023-12-24T08:42:37Z) - SinMPI: Novel View Synthesis from a Single Image with Expanded
Multiplane Images [22.902506592749816]
This paper proposes SinMPI, a novel method that uses an expanded multiplane image (MPI) as the 3D scene representation.
The key idea of our method is to use Stable Diffusion to generate out-of-view contents.
Both qualitative and quantitative experiments have been conducted to validate the superiority of our method to the state of the art.
arXiv Detail & Related papers (2023-12-18T09:16:30Z) - EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion [60.30030562932703]
EpiDiff is a localized interactive multiview diffusion model.
It generates 16 multiview images in just 12 seconds.
It surpasses previous methods in quality evaluation metrics.
arXiv Detail & Related papers (2023-12-11T05:20:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.