Efficient View Synthesis and 3D-based Multi-Frame Denoising with
Multiplane Feature Representations
- URL: http://arxiv.org/abs/2303.18139v2
- Date: Wed, 5 Apr 2023 11:08:37 GMT
- Title: Efficient View Synthesis and 3D-based Multi-Frame Denoising with
Multiplane Feature Representations
- Authors: Thomas Tanay and Ale\v{s} Leonardis and Matteo Maggioni
- Abstract summary: We introduce the first 3D-based multi-frame denoising method that significantly outperforms its 2D-based counterparts with lower computational requirements.
Our method extends the multiplane image (MPI) framework for novel view synthesis by introducing a learnable encoder-renderer pair manipulating multiplane in feature space.
- Score: 1.18885605647513
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While current multi-frame restoration methods combine information from
multiple input images using 2D alignment techniques, recent advances in novel
view synthesis are paving the way for a new paradigm relying on volumetric
scene representations. In this work, we introduce the first 3D-based
multi-frame denoising method that significantly outperforms its 2D-based
counterparts with lower computational requirements. Our method extends the
multiplane image (MPI) framework for novel view synthesis by introducing a
learnable encoder-renderer pair manipulating multiplane representations in
feature space. The encoder fuses information across views and operates in a
depth-wise manner while the renderer fuses information across depths and
operates in a view-wise manner. The two modules are trained end-to-end and
learn to separate depths in an unsupervised way, giving rise to Multiplane
Feature (MPF) representations. Experiments on the Spaces and Real
Forward-Facing datasets as well as on raw burst data validate our approach for
view synthesis, multi-frame denoising, and view synthesis under noisy
conditions.
Related papers
- LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias [50.13457154615262]
We propose a transformer-based approach for scalable and generalizable novel view synthesis from sparse-view inputs.
We introduce two architectures: (1) an encoder-decoder LVSM, which encodes input image tokens into a fixed number of 1D latent tokens; and (2) a decoder-only LVSM, which directly maps input images to novel-view outputs.
arXiv Detail & Related papers (2024-10-22T17:58:28Z) - A Two-Stage Progressive Pre-training using Multi-Modal Contrastive Masked Autoencoders [5.069884983892437]
We propose a new progressive pre-training method for image understanding tasks which leverages RGB-D datasets.
In the first stage, we pre-train the model using contrastive learning to learn cross-modal representations.
In the second stage, we further pre-train the model using masked autoencoding and denoising/noise prediction.
Our approach is scalable, robust and suitable for pre-training RGB-D datasets.
arXiv Detail & Related papers (2024-08-05T05:33:59Z) - MVD-Fusion: Single-view 3D via Depth-consistent Multi-view Generation [54.27399121779011]
We present MVD-Fusion: a method for single-view 3D inference via generative modeling of multi-view-consistent RGB-D images.
We show that our approach can yield more accurate synthesis compared to recent state-of-the-art, including distillation-based 3D inference and prior multi-view generation methods.
arXiv Detail & Related papers (2024-04-04T17:59:57Z) - Learning to Render Novel Views from Wide-Baseline Stereo Pairs [26.528667940013598]
We introduce a method for novel view synthesis given only a single wide-baseline stereo image pair.
Existing approaches to novel view synthesis from sparse observations fail due to recovering incorrect 3D geometry.
We propose an efficient, image-space epipolar line sampling scheme to assemble image features for a target ray.
arXiv Detail & Related papers (2023-04-17T17:40:52Z) - Multi-Plane Neural Radiance Fields for Novel View Synthesis [5.478764356647437]
Novel view synthesis is a long-standing problem that revolves around rendering frames of scenes from novel camera viewpoints.
In this work, we examine the performance, generalization, and efficiency of single-view multi-plane neural radiance fields.
We propose a new multiplane NeRF architecture that accepts multiple views to improve the synthesis results and expand the viewing range.
arXiv Detail & Related papers (2023-03-03T06:32:55Z) - Panoptic Lifting for 3D Scene Understanding with Neural Fields [32.59498558663363]
We propose a novel approach for learning panoptic 3D representations from images of in-the-wild scenes.
Our method requires only machine-generated 2D panoptic segmentation masks inferred from a pre-trained network.
Experimental results validate our approach on the challenging Hypersim, Replica, and ScanNet datasets.
arXiv Detail & Related papers (2022-12-19T19:15:36Z) - Vision Transformer for NeRF-Based View Synthesis from a Single Input
Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation.
To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering.
Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z) - Extracting Triangular 3D Models, Materials, and Lighting From Images [59.33666140713829]
We present an efficient method for joint optimization of materials and lighting from multi-view image observations.
We leverage meshes with spatially-varying materials and environment that can be deployed in any traditional graphics engine.
arXiv Detail & Related papers (2021-11-24T13:58:20Z) - Deep Multi Depth Panoramas for View Synthesis [70.9125433400375]
We present a novel scene representation - Multi Depth Panorama (MDP) - that consists of multiple RGBD$alpha$ panoramas.
MDPs are more compact than previous 3D scene representations and enable high-quality, efficient new view rendering.
arXiv Detail & Related papers (2020-08-04T20:29:15Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.