MVDiffusion: Enabling Holistic Multi-view Image Generation with
Correspondence-Aware Diffusion
- URL: http://arxiv.org/abs/2307.01097v7
- Date: Mon, 25 Dec 2023 04:32:26 GMT
- Title: MVDiffusion: Enabling Holistic Multi-view Image Generation with
Correspondence-Aware Diffusion
- Authors: Shitao Tang, Fuyang Zhang, Jiacheng Chen, Peng Wang, Yasutaka Furukawa
- Abstract summary: This paper introduces MVDiffusion, a simple yet effective method for generating consistent multiview images.
MVDiffusion simultaneously generates all images with a global, effectively addressing the prevalent error accumulation.
- Score: 26.582847694092884
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper introduces MVDiffusion, a simple yet effective method for
generating consistent multi-view images from text prompts given pixel-to-pixel
correspondences (e.g., perspective crops from a panorama or multi-view images
given depth maps and poses). Unlike prior methods that rely on iterative image
warping and inpainting, MVDiffusion simultaneously generates all images with a
global awareness, effectively addressing the prevalent error accumulation
issue. At its core, MVDiffusion processes perspective images in parallel with a
pre-trained text-to-image diffusion model, while integrating novel
correspondence-aware attention layers to facilitate cross-view interactions.
For panorama generation, while only trained with 10k panoramas, MVDiffusion is
able to generate high-resolution photorealistic images for arbitrary texts or
extrapolate one perspective image to a 360-degree view. For multi-view
depth-to-image generation, MVDiffusion demonstrates state-of-the-art
performance for texturing a scene mesh.
Related papers
- Vivid-ZOO: Multi-View Video Generation with Diffusion Model [76.96449336578286]
New challenges lie in the lack of massive captioned multi-view videos and the complexity of modeling such multi-dimensional distribution.
We propose a novel diffusion-based pipeline that generates high-quality multi-view videos centered around a dynamic 3D object from text.
arXiv Detail & Related papers (2024-06-12T21:44:04Z) - Era3D: High-Resolution Multiview Diffusion using Efficient Row-wise Attention [87.02613021058484]
We introduce Era3D, a novel multiview diffusion method that generates high-resolution multiview images from a single-view image.
Era3D generates high-quality multiview images with up to a 512*512 resolution while reducing complexity by 12x times.
arXiv Detail & Related papers (2024-05-19T17:13:16Z) - Multi-view Aggregation Network for Dichotomous Image Segmentation [76.75904424539543]
Dichotomous Image (DIS) has recently emerged towards high-precision object segmentation from high-resolution natural images.
Existing methods rely on tedious multiple encoder-decoder streams and stages to gradually complete the global localization and local refinement.
Inspired by it, we model DIS as a multi-view object perception problem and provide a parsimonious multi-view aggregation network (MVANet)
Experiments on the popular DIS-5K dataset show that our MVANet significantly outperforms state-of-the-art methods in both accuracy and speed.
arXiv Detail & Related papers (2024-04-11T03:00:00Z) - Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object
Structure via HyperNetworks [53.67497327319569]
We introduce a novel neural rendering technique to solve image-to-3D from a single view.
Our approach employs the signed distance function as the surface representation and incorporates generalizable priors through geometry-encoding volumes and HyperNetworks.
Our experiments show the advantages of our proposed approach with consistent results and rapid generation.
arXiv Detail & Related papers (2023-12-24T08:42:37Z) - SinMPI: Novel View Synthesis from a Single Image with Expanded
Multiplane Images [22.902506592749816]
This paper proposes SinMPI, a novel method that uses an expanded multiplane image (MPI) as the 3D scene representation.
The key idea of our method is to use Stable Diffusion to generate out-of-view contents.
Both qualitative and quantitative experiments have been conducted to validate the superiority of our method to the state of the art.
arXiv Detail & Related papers (2023-12-18T09:16:30Z) - EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion [60.30030562932703]
EpiDiff is a localized interactive multiview diffusion model.
It generates 16 multiview images in just 12 seconds.
It surpasses previous methods in quality evaluation metrics.
arXiv Detail & Related papers (2023-12-11T05:20:52Z) - MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation [34.61940502872307]
MultiDiffusion is a unified framework that enables versatile and controllable image generation.
We show that MultiDiffusion can be readily applied to generate high quality and diverse images.
arXiv Detail & Related papers (2023-02-16T06:28:29Z) - ViCE: Self-Supervised Visual Concept Embeddings as Contextual and Pixel
Appearance Invariant Semantic Representations [77.3590853897664]
This work presents a self-supervised method to learn dense semantically rich visual embeddings for images inspired by methods for learning word embeddings in NLP.
arXiv Detail & Related papers (2021-11-24T12:27:30Z) - Bridging the Visual Gap: Wide-Range Image Blending [16.464837892640812]
We introduce an effective deep-learning model to realize wide-range image blending.
We experimentally demonstrate that our proposed method is able to produce visually appealing results.
arXiv Detail & Related papers (2021-03-28T15:07:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.