DynPoint: Dynamic Neural Point For View Synthesis
- URL: http://arxiv.org/abs/2310.18999v3
- Date: Fri, 19 Jan 2024 01:57:15 GMT
- Title: DynPoint: Dynamic Neural Point For View Synthesis
- Authors: Kaichen Zhou, Jia-Xing Zhong, Sangyun Shin, Kai Lu, Yiyuan Yang,
Andrew Markham, Niki Trigoni
- Abstract summary: We propose DynPoint, an algorithm designed to facilitate the rapid synthesis of novel views for unconstrained monocular videos.
DynPoint concentrates on predicting the explicit 3D correspondence between neighboring frames to realize information aggregation.
Our method exhibits strong robustness in handling long-duration videos without learning a canonical representation of video content.
- Score: 45.44096876841621
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The introduction of neural radiance fields has greatly improved the
effectiveness of view synthesis for monocular videos. However, existing
algorithms face difficulties when dealing with uncontrolled or lengthy
scenarios, and require extensive training time specific to each new scenario.
To tackle these limitations, we propose DynPoint, an algorithm designed to
facilitate the rapid synthesis of novel views for unconstrained monocular
videos. Rather than encoding the entirety of the scenario information into a
latent representation, DynPoint concentrates on predicting the explicit 3D
correspondence between neighboring frames to realize information aggregation.
Specifically, this correspondence prediction is achieved through the estimation
of consistent depth and scene flow information across frames. Subsequently, the
acquired correspondence is utilized to aggregate information from multiple
reference frames to a target frame, by constructing hierarchical neural point
clouds. The resulting framework enables swift and accurate view synthesis for
desired views of target frames. The experimental results obtained demonstrate
the considerable acceleration of training time achieved - typically an order of
magnitude - by our proposed method while yielding comparable outcomes compared
to prior approaches. Furthermore, our method exhibits strong robustness in
handling long-duration videos without learning a canonical representation of
video content.
Related papers
- D-NPC: Dynamic Neural Point Clouds for Non-Rigid View Synthesis from Monocular Video [53.83936023443193]
This paper contributes to the field by introducing a new synthesis method for dynamic novel view from monocular video, such as smartphone captures.
Our approach represents the as a $textitdynamic neural point cloud$, an implicit time-conditioned point cloud that encodes local geometry and appearance in separate hash-encoded neural feature grids.
arXiv Detail & Related papers (2024-06-14T14:35:44Z) - Multi-Plane Neural Radiance Fields for Novel View Synthesis [5.478764356647437]
Novel view synthesis is a long-standing problem that revolves around rendering frames of scenes from novel camera viewpoints.
In this work, we examine the performance, generalization, and efficiency of single-view multi-plane neural radiance fields.
We propose a new multiplane NeRF architecture that accepts multiple views to improve the synthesis results and expand the viewing range.
arXiv Detail & Related papers (2023-03-03T06:32:55Z) - Representation Recycling for Streaming Video Analysis [19.068248496174903]
StreamDEQ aims to infer frame-wise representations on videos with minimal per-frame computation.
We show that StreamDEQ is able to recover near-optimal representations in a few frames' time and maintain an up-to-date representation throughout the video duration.
arXiv Detail & Related papers (2022-04-28T13:35:14Z) - Condensing a Sequence to One Informative Frame for Video Recognition [113.3056598548736]
This paper studies a two-step alternative that first condenses the video sequence to an informative "frame"
A valid question is how to define "useful information" and then distill from a sequence down to one synthetic frame.
IFS consistently demonstrates evident improvements on image-based 2D networks and clip-based 3D networks.
arXiv Detail & Related papers (2022-01-11T16:13:43Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - CCVS: Context-aware Controllable Video Synthesis [95.22008742695772]
presentation introduces a self-supervised learning approach to the synthesis of new video clips from old ones.
It conditions the synthesis process on contextual information for temporal continuity and ancillary information for fine control.
arXiv Detail & Related papers (2021-07-16T17:57:44Z) - Cascaded Deep Video Deblurring Using Temporal Sharpness Prior [88.98348546566675]
The proposed algorithm mainly consists of optical flow estimation from intermediate latent frames and latent frame restoration steps.
It first develops a deep CNN model to estimate optical flow from intermediate latent frames and then restores the latent frames based on the estimated optical flow.
We show that exploring the domain knowledge of video deblurring is able to make the deep CNN model more compact and efficient.
arXiv Detail & Related papers (2020-04-06T09:13:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.