Im4D: High-Fidelity and Real-Time Novel View Synthesis for Dynamic
Scenes
- URL: http://arxiv.org/abs/2310.08585v1
- Date: Thu, 12 Oct 2023 17:59:57 GMT
- Title: Im4D: High-Fidelity and Real-Time Novel View Synthesis for Dynamic
Scenes
- Authors: Haotong Lin, Sida Peng, Zhen Xu, Tao Xie, Xingyi He, Hujun Bao,
Xiaowei Zhou
- Abstract summary: We introduce Im4D, a hybrid representation that consists of a grid-based geometry representation and a multi-view image-based appearance representation.
We represent the scene appearance by the original multi-view videos and a network that learns to predict the color of a 3D point from image features.
We show that Im4D state-of-the-art performance in rendering quality and can be trained efficiently, while realizing real-time rendering with a speed of 79.8 FPS for 512x512 images.
- Score: 69.52540205439989
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper aims to tackle the challenge of dynamic view synthesis from
multi-view videos. The key observation is that while previous grid-based
methods offer consistent rendering, they fall short in capturing appearance
details of a complex dynamic scene, a domain where multi-view image-based
rendering methods demonstrate the opposite properties. To combine the best of
two worlds, we introduce Im4D, a hybrid scene representation that consists of a
grid-based geometry representation and a multi-view image-based appearance
representation. Specifically, the dynamic geometry is encoded as a 4D density
function composed of spatiotemporal feature planes and a small MLP network,
which globally models the scene structure and facilitates the rendering
consistency. We represent the scene appearance by the original multi-view
videos and a network that learns to predict the color of a 3D point from image
features, instead of memorizing detailed appearance totally with networks,
thereby naturally making the learning of networks easier. Our method is
evaluated on five dynamic view synthesis datasets including DyNeRF, ZJU-MoCap,
NHR, DNA-Rendering and ENeRF-Outdoor datasets. The results show that Im4D
exhibits state-of-the-art performance in rendering quality and can be trained
efficiently, while realizing real-time rendering with a speed of 79.8 FPS for
512x512 images, on a single RTX 3090 GPU.
Related papers
- Real-time Photorealistic Dynamic Scene Representation and Rendering with
4D Gaussian Splatting [8.078460597825142]
Reconstructing dynamic 3D scenes from 2D images and generating diverse views over time is challenging due to scene complexity and temporal dynamics.
We propose to approximate the underlying-temporal rendering volume of a dynamic scene by optimizing a collection of 4D primitives, with explicit geometry and appearance modeling.
Our model is conceptually simple, consisting of a 4D Gaussian parameterized by anisotropic ellipses that can rotate arbitrarily in space and time, as well as view-dependent and time-evolved appearance represented by the coefficient of 4D spherindrical harmonics.
arXiv Detail & Related papers (2023-10-16T17:57:43Z) - NeRFPlayer: A Streamable Dynamic Scene Representation with Decomposed
Neural Radiance Fields [99.57774680640581]
We present an efficient framework capable of fast reconstruction, compact modeling, and streamable rendering.
We propose to decompose the 4D space according to temporal characteristics. Points in the 4D space are associated with probabilities belonging to three categories: static, deforming, and new areas.
arXiv Detail & Related papers (2022-10-28T07:11:05Z) - Coordinates Are NOT Lonely -- Codebook Prior Helps Implicit Neural 3D
Representations [29.756718435405983]
Implicit neural 3D representation has achieved impressive results in surface or scene reconstruction and novel view synthesis.
Existing approaches, such as Neural Radiance Field (NeRF) and its variants, usually require dense input views.
We introduce a novel coordinate-based model, CoCo-INR, for implicit neural 3D representation.
arXiv Detail & Related papers (2022-10-20T11:13:50Z) - Vision Transformer for NeRF-Based View Synthesis from a Single Input
Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation.
To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering.
Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z) - IBRNet: Learning Multi-View Image-Based Rendering [67.15887251196894]
We present a method that synthesizes novel views of complex scenes by interpolating a sparse set of nearby views.
By drawing on source views at render time, our method hearkens back to classic work on image-based rendering.
arXiv Detail & Related papers (2021-02-25T18:56:21Z) - Neural Radiance Flow for 4D View Synthesis and Video Processing [59.9116932930108]
We present a method to learn a 4D spatial-temporal representation of a dynamic scene from a set of RGB images.
Key to our approach is the use of a neural implicit representation that learns to capture the 3D occupancy, radiance, and dynamics of the scene.
arXiv Detail & Related papers (2020-12-17T17:54:32Z) - NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis [78.5281048849446]
We present a method that achieves state-of-the-art results for synthesizing novel views of complex scenes.
Our algorithm represents a scene using a fully-connected (non-convolutional) deep network.
Because volume rendering is naturally differentiable, the only input required to optimize our representation is a set of images with known camera poses.
arXiv Detail & Related papers (2020-03-19T17:57:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.