SinMPI: Novel View Synthesis from a Single Image with Expanded
Multiplane Images
- URL: http://arxiv.org/abs/2312.11037v1
- Date: Mon, 18 Dec 2023 09:16:30 GMT
- Title: SinMPI: Novel View Synthesis from a Single Image with Expanded
Multiplane Images
- Authors: Guo Pu, Peng-Shuai Wang, Zhouhui Lian
- Abstract summary: This paper proposes SinMPI, a novel method that uses an expanded multiplane image (MPI) as the 3D scene representation.
The key idea of our method is to use Stable Diffusion to generate out-of-view contents.
Both qualitative and quantitative experiments have been conducted to validate the superiority of our method to the state of the art.
- Score: 22.902506592749816
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Single-image novel view synthesis is a challenging and ongoing problem that
aims to generate an infinite number of consistent views from a single input
image. Although significant efforts have been made to advance the quality of
generated novel views, less attention has been paid to the expansion of the
underlying scene representation, which is crucial to the generation of
realistic novel view images. This paper proposes SinMPI, a novel method that
uses an expanded multiplane image (MPI) as the 3D scene representation to
significantly expand the perspective range of MPI and generate high-quality
novel views from a large multiplane space. The key idea of our method is to use
Stable Diffusion to generate out-of-view contents, project all scene contents
into an expanded multiplane image according to depths predicted by monocular
depth estimators, and then optimize the multiplane image under the supervision
of pseudo multi-view data generated by a depth-aware warping and inpainting
module. Both qualitative and quantitative experiments have been conducted to
validate the superiority of our method to the state of the art. Our code and
data are available at https://github.com/TrickyGo/SinMPI.
Related papers
- ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis [63.169364481672915]
We propose textbfViewCrafter, a novel method for synthesizing high-fidelity novel views of generic scenes from single or sparse images.
Our method takes advantage of the powerful generation capabilities of video diffusion model and the coarse 3D clues offered by point-based representation to generate high-quality video frames.
arXiv Detail & Related papers (2024-09-03T16:53:19Z) - Pixel-Aligned Multi-View Generation with Depth Guided Decoder [86.1813201212539]
We propose a novel method for pixel-level image-to-multi-view generation.
Unlike prior work, we incorporate attention layers across multi-view images in the VAE decoder of a latent video diffusion model.
Our model enables better pixel alignment across multi-view images.
arXiv Detail & Related papers (2024-08-26T04:56:41Z) - MultiDiff: Consistent Novel View Synthesis from a Single Image [60.04215655745264]
MultiDiff is a novel approach for consistent novel view synthesis of scenes from a single RGB image.
Our results demonstrate that MultiDiff outperforms state-of-the-art methods on the challenging, real-world datasets RealEstate10K and ScanNet.
arXiv Detail & Related papers (2024-06-26T17:53:51Z) - ReShader: View-Dependent Highlights for Single Image View-Synthesis [5.736642774848791]
We propose to split the view synthesis process into two independent tasks of pixel reshading and relocation.
During the reshading process, we take the single image as the input and adjust its shading based on the novel camera.
This reshaded image is then used as the input to an existing view synthesis method to relocate the pixels and produce the final novel view image.
arXiv Detail & Related papers (2023-09-19T15:23:52Z) - SAMPLING: Scene-adaptive Hierarchical Multiplane Images Representation
for Novel View Synthesis from a Single Image [60.52991173059486]
We introduce SAMPLING, a Scene-adaptive Hierarchical Multiplane Images Representation for Novel View Synthesis from a Single Image.
Our method demonstrates considerable performance gains in large-scale unbounded outdoor scenes using a single image on the KITTI dataset.
arXiv Detail & Related papers (2023-09-12T15:33:09Z) - High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization [51.878078860524795]
We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views.
Our approach enables high-fidelity 3D rendering from a single image, which is promising for various applications of AI-generated 3D content.
arXiv Detail & Related papers (2022-11-28T18:59:52Z) - Remote Sensing Novel View Synthesis with Implicit Multiplane
Representations [26.33490094119609]
We propose a novel remote sensing view synthesis method by leveraging the recent advances in implicit neural representations.
Considering the overhead and far depth imaging of remote sensing images, we represent the 3D space by combining implicit multiplane images (MPI) representation and deep neural networks.
Images from any novel views can be freely rendered on the basis of the reconstructed model.
arXiv Detail & Related papers (2022-05-18T13:03:55Z) - Single-View View Synthesis with Multiplane Images [64.46556656209769]
We apply deep learning to generate multiplane images given two or more input images at known viewpoints.
Our method learns to predict a multiplane image directly from a single image input.
It additionally generates reasonable depth maps and fills in content behind the edges of foreground objects in background layers.
arXiv Detail & Related papers (2020-04-23T17:59:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.