Neural Pixel Composition: 3D-4D View Synthesis from Multi-Views
- URL: http://arxiv.org/abs/2207.10663v1
- Date: Thu, 21 Jul 2022 17:58:02 GMT
- Title: Neural Pixel Composition: 3D-4D View Synthesis from Multi-Views
- Authors: Aayush Bansal and Michael Zollhoefer
- Abstract summary: We present a novel approach for continuous 3D-4D view synthesis given only a discrete set of multi-view observations as input.
The proposed formulation reliably operates on sparse and wide-baseline multi-view imagery.
It can be trained efficiently within a few seconds to 10 minutes for hi-res (12MP) content.
- Score: 12.386462516398469
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present Neural Pixel Composition (NPC), a novel approach for continuous
3D-4D view synthesis given only a discrete set of multi-view observations as
input. Existing state-of-the-art approaches require dense multi-view
supervision and an extensive computational budget. The proposed formulation
reliably operates on sparse and wide-baseline multi-view imagery and can be
trained efficiently within a few seconds to 10 minutes for hi-res (12MP)
content, i.e., 200-400X faster convergence than existing methods. Crucial to
our approach are two core novelties: 1) a representation of a pixel that
contains color and depth information accumulated from multi-views for a
particular location and time along a line of sight, and 2) a multi-layer
perceptron (MLP) that enables the composition of this rich information provided
for a pixel location to obtain the final color output. We experiment with a
large variety of multi-view sequences, compare to existing approaches, and
achieve better results in diverse and challenging settings. Finally, our
approach enables dense 3D reconstruction from sparse multi-views, where COLMAP,
a state-of-the-art 3D reconstruction approach, struggles.
Related papers
- Pixel-Aligned Multi-View Generation with Depth Guided Decoder [86.1813201212539]
We propose a novel method for pixel-level image-to-multi-view generation.
Unlike prior work, we incorporate attention layers across multi-view images in the VAE decoder of a latent video diffusion model.
Our model enables better pixel alignment across multi-view images.
arXiv Detail & Related papers (2024-08-26T04:56:41Z) - MultiDiff: Consistent Novel View Synthesis from a Single Image [60.04215655745264]
MultiDiff is a novel approach for consistent novel view synthesis of scenes from a single RGB image.
Our results demonstrate that MultiDiff outperforms state-of-the-art methods on the challenging, real-world datasets RealEstate10K and ScanNet.
arXiv Detail & Related papers (2024-06-26T17:53:51Z) - MVD-Fusion: Single-view 3D via Depth-consistent Multi-view Generation [54.27399121779011]
We present MVD-Fusion: a method for single-view 3D inference via generative modeling of multi-view-consistent RGB-D images.
We show that our approach can yield more accurate synthesis compared to recent state-of-the-art, including distillation-based 3D inference and prior multi-view generation methods.
arXiv Detail & Related papers (2024-04-04T17:59:57Z) - ConsistNet: Enforcing 3D Consistency for Multi-view Images Diffusion [61.37481051263816]
Given a single image of a 3D object, this paper proposes a method (named ConsistNet) that is able to generate multiple images of the same object.
Our method effectively learns 3D consistency over a frozen Zero123 backbone and can generate 16 surrounding views of the object within 40 seconds on a single A100 GPU.
arXiv Detail & Related papers (2023-10-16T12:29:29Z) - Efficient View Synthesis and 3D-based Multi-Frame Denoising with
Multiplane Feature Representations [1.18885605647513]
We introduce the first 3D-based multi-frame denoising method that significantly outperforms its 2D-based counterparts with lower computational requirements.
Our method extends the multiplane image (MPI) framework for novel view synthesis by introducing a learnable encoder-renderer pair manipulating multiplane in feature space.
arXiv Detail & Related papers (2023-03-31T15:23:35Z) - Multi-Plane Neural Radiance Fields for Novel View Synthesis [5.478764356647437]
Novel view synthesis is a long-standing problem that revolves around rendering frames of scenes from novel camera viewpoints.
In this work, we examine the performance, generalization, and efficiency of single-view multi-plane neural radiance fields.
We propose a new multiplane NeRF architecture that accepts multiple views to improve the synthesis results and expand the viewing range.
arXiv Detail & Related papers (2023-03-03T06:32:55Z) - DeepMultiCap: Performance Capture of Multiple Characters Using Sparse
Multiview Cameras [63.186486240525554]
DeepMultiCap is a novel method for multi-person performance capture using sparse multi-view cameras.
Our method can capture time varying surface details without the need of using pre-scanned template models.
arXiv Detail & Related papers (2021-05-01T14:32:13Z) - Deep Multi Depth Panoramas for View Synthesis [70.9125433400375]
We present a novel scene representation - Multi Depth Panorama (MDP) - that consists of multiple RGBD$alpha$ panoramas.
MDPs are more compact than previous 3D scene representations and enable high-quality, efficient new view rendering.
arXiv Detail & Related papers (2020-08-04T20:29:15Z) - Light3DPose: Real-time Multi-Person 3D PoseEstimation from Multiple
Views [5.510992382274774]
We present an approach to perform 3D pose estimation of multiple people from a few calibrated camera views.
Our architecture aggregates feature-maps from a 2D pose estimator backbone into a comprehensive representation of the 3D scene.
The proposed method is inherently efficient: as a pure bottom-up approach, it is computationally independent of the number of people in the scene.
arXiv Detail & Related papers (2020-04-06T14:12:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.