Revealing Disocclusions in Temporal View Synthesis through Infilling
Vector Prediction
- URL: http://arxiv.org/abs/2110.08805v1
- Date: Sun, 17 Oct 2021 12:11:34 GMT
- Title: Revealing Disocclusions in Temporal View Synthesis through Infilling
Vector Prediction
- Authors: Vijayalakshmi Kanchana, Nagabhushan Somraj, Suraj Yadwad, Rajiv
Soundararajan
- Abstract summary: We study the idea of an infilling vector to infill by pointing to a non-disoccluded region in the synthesized view.
To exploit the structure of disocclusions created by camera motion during their infilling, we rely on two important cues, temporal correlation of infilling directions and depth.
- Score: 6.51882364384472
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of temporal view synthesis, where the goal is to
predict a future video frame from the past frames using knowledge of the depth
and relative camera motion. In contrast to revealing the disoccluded regions
through intensity based infilling, we study the idea of an infilling vector to
infill by pointing to a non-disoccluded region in the synthesized view. To
exploit the structure of disocclusions created by camera motion during their
infilling, we rely on two important cues, temporal correlation of infilling
directions and depth. We design a learning framework to predict the infilling
vector by computing a temporal prior that reflects past infilling directions
and a normalized depth map as input to the network. We conduct extensive
experiments on a large scale dataset we build for evaluating temporal view
synthesis in addition to the SceneNet RGB-D dataset. Our experiments
demonstrate that our infilling vector prediction approach achieves superior
quantitative and qualitative infilling performance compared to other approaches
in literature.
Related papers
- Learning Temporally Consistent Video Depth from Video Diffusion Priors [57.929828486615605]
This work addresses the challenge of video depth estimation.
We reformulate the prediction task into a conditional generation problem.
This allows us to leverage the prior knowledge embedded in existing video generation models.
arXiv Detail & Related papers (2024-06-03T16:20:24Z) - DynPoint: Dynamic Neural Point For View Synthesis [45.44096876841621]
We propose DynPoint, an algorithm designed to facilitate the rapid synthesis of novel views for unconstrained monocular videos.
DynPoint concentrates on predicting the explicit 3D correspondence between neighboring frames to realize information aggregation.
Our method exhibits strong robustness in handling long-duration videos without learning a canonical representation of video content.
arXiv Detail & Related papers (2023-10-29T12:55:53Z) - DualRefine: Self-Supervised Depth and Pose Estimation Through Iterative Epipolar Sampling and Refinement Toward Equilibrium [11.78276690882616]
Self-supervised multi-frame depth estimation achieves high accuracy by computing matching costs of pixel correspondences between adjacent frames.
We propose the Dual model, which tightly couples depth and pose estimation through a feedback loop.
Our novel update pipeline uses a deep equilibrium model framework to iteratively refine depth estimates and a hidden state of feature maps.
arXiv Detail & Related papers (2023-04-07T09:46:29Z) - Uncovering the Missing Pattern: Unified Framework Towards Trajectory
Imputation and Prediction [60.60223171143206]
Trajectory prediction is a crucial undertaking in understanding entity movement or human behavior from observed sequences.
Current methods often assume that the observed sequences are complete while ignoring the potential for missing values.
This paper presents a unified framework, the Graph-based Conditional Variational Recurrent Neural Network (GC-VRNN), which can perform trajectory imputation and prediction simultaneously.
arXiv Detail & Related papers (2023-03-28T14:27:27Z) - STDepthFormer: Predicting Spatio-temporal Depth from Video with a
Self-supervised Transformer Model [0.0]
Self-supervised model simultaneously predicts a sequence of future frames from video-input with a spatial-temporal attention network is proposed.
The proposed model leverages prior scene knowledge such as object shape and texture similar to single-image depth inference methods.
It is implicitly capable of forecasting the motion of objects in the scene, rather than requiring complex models involving multi-object detection, segmentation and tracking.
arXiv Detail & Related papers (2023-03-02T12:22:51Z) - TempSAL -- Uncovering Temporal Information for Deep Saliency Prediction [64.63645677568384]
We introduce a novel saliency prediction model that learns to output saliency maps in sequential time intervals.
Our approach locally modulates the saliency predictions by combining the learned temporal maps.
Our code will be publicly available on GitHub.
arXiv Detail & Related papers (2023-01-05T22:10:16Z) - Learnable Patchmatch and Self-Teaching for Multi-Frame Depth Estimation in Monocular Endoscopy [16.233423010425355]
We propose a novel unsupervised multi-frame monocular depth estimation model.
The proposed model integrates a learnable patchmatch module to adaptively increase the discriminative ability in regions with low and homogeneous textures.
As a byproduct of the self-teaching paradigm, the proposed model is able to improve the depth predictions when more frames are input at test time.
arXiv Detail & Related papers (2022-05-30T12:11:03Z) - On the Sins of Image Synthesis Loss for Self-supervised Depth Estimation [60.780823530087446]
We show that improvements in image synthesis do not necessitate improvement in depth estimation.
We attribute this diverging phenomenon to aleatoric uncertainties, which originate from data.
This observed divergence has not been previously reported or studied in depth.
arXiv Detail & Related papers (2021-09-13T17:57:24Z) - Unsupervised Scale-consistent Depth Learning from Video [131.3074342883371]
We propose a monocular depth estimator SC-Depth, which requires only unlabelled videos for training.
Thanks to the capability of scale-consistent prediction, we show that our monocular-trained deep networks are readily integrated into the ORB-SLAM2 system.
The proposed hybrid Pseudo-RGBD SLAM shows compelling results in KITTI, and it generalizes well to the KAIST dataset without additional training.
arXiv Detail & Related papers (2021-05-25T02:17:56Z) - Organization of a Latent Space structure in VAE/GAN trained by
navigation data [0.0]
We present a novel artificial cognitive mapping system using generative deep neural networks (VAE/GAN)
We show that the distance of the predicted image is reflected in the distance of the corresponding latent vector after training.
The present study allows the network to internally generate temporal sequences analogous to hippocampal replay/pre-play.
arXiv Detail & Related papers (2021-02-03T03:13:26Z) - Representation Learning for Sequence Data with Deep Autoencoding
Predictive Components [96.42805872177067]
We propose a self-supervised representation learning method for sequence data, based on the intuition that useful representations of sequence data should exhibit a simple structure in the latent space.
We encourage this latent structure by maximizing an estimate of predictive information of latent feature sequences, which is the mutual information between past and future windows at each time step.
We demonstrate that our method recovers the latent space of noisy dynamical systems, extracts predictive features for forecasting tasks, and improves automatic speech recognition when used to pretrain the encoder on large amounts of unlabeled data.
arXiv Detail & Related papers (2020-10-07T03:34:01Z) - Consistency Guided Scene Flow Estimation [159.24395181068218]
CGSF is a self-supervised framework for the joint reconstruction of 3D scene structure and motion from stereo video.
We show that the proposed model can reliably predict disparity and scene flow in challenging imagery.
It achieves better generalization than the state-of-the-art, and adapts quickly and robustly to unseen domains.
arXiv Detail & Related papers (2020-06-19T17:28:07Z) - Interpretation of Deep Temporal Representations by Selective
Visualization of Internally Activated Nodes [24.228613156037532]
We propose two new frameworks to visualize temporal representations learned from deep neural networks.
Our algorithm interprets the decision of temporal neural network by extracting highly activated periods.
We characterize such sub-sequences with clustering and calculate the uncertainty of the suggested type and actual data.
arXiv Detail & Related papers (2020-04-27T01:45:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.