Related papers: IBRNet: Learning Multi-View Image-Based Rendering

IBRNet: Learning Multi-View Image-Based Rendering

URL: http://arxiv.org/abs/2102.13090v1
Date: Thu, 25 Feb 2021 18:56:21 GMT
Title: IBRNet: Learning Multi-View Image-Based Rendering
Authors: Qianqian Wang, Zhicheng Wang, Kyle Genova, Pratul Srinivasan, Howard Zhou, Jonathan T. Barron, Ricardo Martin-Brualla, Noah Snavely, Thomas Funkhouser
Abstract summary: We present a method that synthesizes novel views of complex scenes by interpolating a sparse set of nearby views. By drawing on source views at render time, our method hearkens back to classic work on image-based rendering.
Score: 67.15887251196894
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present a method that synthesizes novel views of complex scenes by interpolating a sparse set of nearby views. The core of our method is a network architecture that includes a multilayer perceptron and a ray transformer that estimates radiance and volume density at continuous 5D locations (3D spatial locations and 2D viewing directions), drawing appearance information on the fly from multiple source views. By drawing on source views at render time, our method hearkens back to classic work on image-based rendering (IBR), and allows us to render high-resolution imagery. Unlike neural scene representation work that optimizes per-scene functions for rendering, we learn a generic view interpolation function that generalizes to novel scenes. We render images using classic volume rendering, which is fully differentiable and allows us to train using only multi-view posed images as supervision. Experiments show that our method outperforms recent novel view synthesis methods that also seek to generalize to novel scenes. Further, if fine-tuned on each scene, our method is competitive with state-of-the-art single-scene neural rendering methods.

Related papers

Zero-Shot Novel View and Depth Synthesis with Multi-View Geometric Diffusion [27.836518920611557]
We introduce MVGD, a diffusion-based architecture capable of direct pixel-level generation of images and depth maps from novel viewpoints. We train this model on a collection of more than 60 million multi-view samples from publicly available datasets. We report state-of-the-art results in multiple novel view synthesis benchmarks, as well as multi-view stereo and video depth estimation.
arXiv Detail & Related papers (2025-01-30T23:43:06Z)
Learning to Render Novel Views from Wide-Baseline Stereo Pairs [26.528667940013598]
We introduce a method for novel view synthesis given only a single wide-baseline stereo image pair. Existing approaches to novel view synthesis from sparse observations fail due to recovering incorrect 3D geometry. We propose an efficient, image-space epipolar line sampling scheme to assemble image features for a target ray.
arXiv Detail & Related papers (2023-04-17T17:40:52Z)
MAIR: Multi-view Attention Inverse Rendering with 3D Spatially-Varying Lighting Estimation [13.325800282424598]
We propose a scene-level inverse rendering framework that uses multi-view images to decompose the scene into geometry, a SVBRDF, and 3D spatially-varying lighting. Our experiments show that the proposed method achieves better performance than single-view-based methods, but also achieves robust performance on unseen real-world scene.
arXiv Detail & Related papers (2023-03-22T08:07:28Z)
Multi-Plane Neural Radiance Fields for Novel View Synthesis [5.478764356647437]
Novel view synthesis is a long-standing problem that revolves around rendering frames of scenes from novel camera viewpoints. In this work, we examine the performance, generalization, and efficiency of single-view multi-plane neural radiance fields. We propose a new multiplane NeRF architecture that accepts multiple views to improve the synthesis results and expand the viewing range.
arXiv Detail & Related papers (2023-03-03T06:32:55Z)
Vision Transformer for NeRF-Based View Synthesis from a Single Input Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation. To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering. Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z)
Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations [48.05445941939446]
A classical problem in computer vision is to infer a 3D scene representation from few images that can be used to render novel views at interactive rates. We propose the Scene Representation Transformer (SRT), a method which processes posed or unposed RGB images of a new area. We show that this method outperforms recent baselines in terms of PSNR and speed on synthetic datasets.
arXiv Detail & Related papers (2021-11-25T16:18:56Z)
Extracting Triangular 3D Models, Materials, and Lighting From Images [59.33666140713829]
We present an efficient method for joint optimization of materials and lighting from multi-view image observations. We leverage meshes with spatially-varying materials and environment that can be deployed in any traditional graphics engine.
arXiv Detail & Related papers (2021-11-24T13:58:20Z)
Neural Radiance Flow for 4D View Synthesis and Video Processing [59.9116932930108]
We present a method to learn a 4D spatial-temporal representation of a dynamic scene from a set of RGB images. Key to our approach is the use of a neural implicit representation that learns to capture the 3D occupancy, radiance, and dynamics of the scene.
arXiv Detail & Related papers (2020-12-17T17:54:32Z)
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis [78.5281048849446]
We present a method that achieves state-of-the-art results for synthesizing novel views of complex scenes. Our algorithm represents a scene using a fully-connected (non-convolutional) deep network. Because volume rendering is naturally differentiable, the only input required to optimize our representation is a set of images with known camera poses.
arXiv Detail & Related papers (2020-03-19T17:57:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.