iNVS: Repurposing Diffusion Inpainters for Novel View Synthesis
- URL: http://arxiv.org/abs/2310.16167v1
- Date: Tue, 24 Oct 2023 20:33:19 GMT
- Title: iNVS: Repurposing Diffusion Inpainters for Novel View Synthesis
- Authors: Yash Kant, Aliaksandr Siarohin, Michael Vasilkovsky, Riza Alp Guler,
Jian Ren, Sergey Tulyakov, Igor Gilitschenski
- Abstract summary: We present a method for generating consistent novel views from a single source image.
Our approach focuses on maximizing the reuse of visible pixels from the source image.
We use a monocular depth estimator that transfers visible pixels from the source view to the target view.
- Score: 45.88928345042103
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a method for generating consistent novel views from a single
source image. Our approach focuses on maximizing the reuse of visible pixels
from the source image. To achieve this, we use a monocular depth estimator that
transfers visible pixels from the source view to the target view. Starting from
a pre-trained 2D inpainting diffusion model, we train our method on the
large-scale Objaverse dataset to learn 3D object priors. While training we use
a novel masking mechanism based on epipolar lines to further improve the
quality of our approach. This allows our framework to perform zero-shot novel
view synthesis on a variety of objects. We evaluate the zero-shot abilities of
our framework on three challenging datasets: Google Scanned Objects, Ray Traced
Multiview, and Common Objects in 3D. See our webpage for more details:
https://yashkant.github.io/invs/
Related papers
- Free3D: Consistent Novel View Synthesis without 3D Representation [63.931920010054064]
Free3D is a simple accurate method for monocular open-set novel view synthesis (NVS)
Compared to other works that took a similar approach, we obtain significant improvements without resorting to an explicit 3D representation.
arXiv Detail & Related papers (2023-12-07T18:59:18Z) - BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown
Objects [89.2314092102403]
We present a near real-time method for 6-DoF tracking of an unknown object from a monocular RGBD video sequence.
Our method works for arbitrary rigid objects, even when visual texture is largely absent.
arXiv Detail & Related papers (2023-03-24T17:13:49Z) - PIZZA: A Powerful Image-only Zero-Shot Zero-CAD Approach to 6 DoF
Tracking [27.283648727847268]
We present a method for tracking the 6D motion of objects in RGB video sequences when neither the training images nor the 3D geometry of the objects are available.
In contrast to previous works, our method can therefore consider unknown objects in open world instantly.
Our results on challenging datasets are on par with previous works that require much more information.
arXiv Detail & Related papers (2022-09-15T19:55:13Z) - Vision Transformer for NeRF-Based View Synthesis from a Single Input
Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation.
To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering.
Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z) - Fine Detailed Texture Learning for 3D Meshes with Generative Models [33.42114674602613]
This paper presents a method to reconstruct high-quality textured 3D models from both multi-view and single-view images.
In the first stage, we focus on learning accurate geometry, whereas in the second stage, we focus on learning the texture with a generative adversarial network.
We demonstrate that our method achieves superior 3D textured models compared to the previous works.
arXiv Detail & Related papers (2022-03-17T14:50:52Z) - End-to-End Learning of Multi-category 3D Pose and Shape Estimation [128.881857704338]
We propose an end-to-end method that simultaneously detects 2D keypoints from an image and lifts them to 3D.
The proposed method learns both 2D detection and 3D lifting only from 2D keypoints annotations.
In addition to being end-to-end in image to 3D learning, our method also handles objects from multiple categories using a single neural network.
arXiv Detail & Related papers (2021-12-19T17:10:40Z) - RandomRooms: Unsupervised Pre-training from Synthetic Shapes and
Randomized Layouts for 3D Object Detection [138.2892824662943]
A promising solution is to make better use of the synthetic dataset, which consists of CAD object models, to boost the learning on real datasets.
Recent work on 3D pre-training exhibits failure when transfer features learned on synthetic objects to other real-world applications.
In this work, we put forward a new method called RandomRooms to accomplish this objective.
arXiv Detail & Related papers (2021-08-17T17:56:12Z) - Novel-View Human Action Synthesis [39.72702883597454]
We present a novel 3D reasoning to synthesize the target viewpoint.
We first estimate the 3D mesh of the target body and transfer the rough textures from the 2D images to the mesh.
We produce a semi-dense textured mesh by propagating the transferred textures both locally, within local geodesic neighborhoods, and globally.
arXiv Detail & Related papers (2020-07-06T15:11:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.