Related papers: Revealing Disocclusions in Temporal View Synthesis through Infilling Vector Prediction

Revealing Disocclusions in Temporal View Synthesis through Infilling Vector Prediction

URL: http://arxiv.org/abs/2110.08805v1
Date: Sun, 17 Oct 2021 12:11:34 GMT
Title: Revealing Disocclusions in Temporal View Synthesis through Infilling Vector Prediction
Authors: Vijayalakshmi Kanchana, Nagabhushan Somraj, Suraj Yadwad, Rajiv Soundararajan
Abstract summary: We study the idea of an infilling vector to infill by pointing to a non-disoccluded region in the synthesized view. To exploit the structure of disocclusions created by camera motion during their infilling, we rely on two important cues, temporal correlation of infilling directions and depth.
Score: 6.51882364384472
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the problem of temporal view synthesis, where the goal is to predict a future video frame from the past frames using knowledge of the depth and relative camera motion. In contrast to revealing the disoccluded regions through intensity based infilling, we study the idea of an infilling vector to infill by pointing to a non-disoccluded region in the synthesized view. To exploit the structure of disocclusions created by camera motion during their infilling, we rely on two important cues, temporal correlation of infilling directions and depth. We design a learning framework to predict the infilling vector by computing a temporal prior that reflects past infilling directions and a normalized depth map as input to the network. We conduct extensive experiments on a large scale dataset we build for evaluating temporal view synthesis in addition to the SceneNet RGB-D dataset. Our experiments demonstrate that our infilling vector prediction approach achieves superior quantitative and qualitative infilling performance compared to other approaches in literature.

Related papers

Learning Temporally Consistent Video Depth from Video Diffusion Priors [57.929828486615605]
This work addresses the challenge of video depth estimation. We reformulate the prediction task into a conditional generation problem. This allows us to leverage the prior knowledge embedded in existing video generation models.
arXiv Detail & Related papers (2024-06-03T16:20:24Z)
DynPoint: Dynamic Neural Point For View Synthesis [45.44096876841621]
We propose DynPoint, an algorithm designed to facilitate the rapid synthesis of novel views for unconstrained monocular videos. DynPoint concentrates on predicting the explicit 3D correspondence between neighboring frames to realize information aggregation. Our method exhibits strong robustness in handling long-duration videos without learning a canonical representation of video content.
arXiv Detail & Related papers (2023-10-29T12:55:53Z)
DualRefine: Self-Supervised Depth and Pose Estimation Through Iterative Epipolar Sampling and Refinement Toward Equilibrium [11.78276690882616]
Self-supervised multi-frame depth estimation achieves high accuracy by computing matching costs of pixel correspondences between adjacent frames. We propose the Dual model, which tightly couples depth and pose estimation through a feedback loop. Our novel update pipeline uses a deep equilibrium model framework to iteratively refine depth estimates and a hidden state of feature maps.
arXiv Detail & Related papers (2023-04-07T09:46:29Z)
Uncovering the Missing Pattern: Unified Framework Towards Trajectory Imputation and Prediction [60.60223171143206]
Trajectory prediction is a crucial undertaking in understanding entity movement or human behavior from observed sequences. Current methods often assume that the observed sequences are complete while ignoring the potential for missing values. This paper presents a unified framework, the Graph-based Conditional Variational Recurrent Neural Network (GC-VRNN), which can perform trajectory imputation and prediction simultaneously.
arXiv Detail & Related papers (2023-03-28T14:27:27Z)
STDepthFormer: Predicting Spatio-temporal Depth from Video with a Self-supervised Transformer Model [0.0]
Self-supervised model simultaneously predicts a sequence of future frames from video-input with a spatial-temporal attention network is proposed. The proposed model leverages prior scene knowledge such as object shape and texture similar to single-image depth inference methods. It is implicitly capable of forecasting the motion of objects in the scene, rather than requiring complex models involving multi-object detection, segmentation and tracking.
arXiv Detail & Related papers (2023-03-02T12:22:51Z)
TempSAL -- Uncovering Temporal Information for Deep Saliency Prediction [64.63645677568384]
We introduce a novel saliency prediction model that learns to output saliency maps in sequential time intervals. Our approach locally modulates the saliency predictions by combining the learned temporal maps. Our code will be publicly available on GitHub.
arXiv Detail & Related papers (2023-01-05T22:10:16Z)
Learnable Patchmatch and Self-Teaching for Multi-Frame Depth Estimation in Monocular Endoscopy [16.233423010425355]
We propose a novel unsupervised multi-frame monocular depth estimation model. The proposed model integrates a learnable patchmatch module to adaptively increase the discriminative ability in regions with low and homogeneous textures. As a byproduct of the self-teaching paradigm, the proposed model is able to improve the depth predictions when more frames are input at test time.
arXiv Detail & Related papers (2022-05-30T12:11:03Z)
On the Sins of Image Synthesis Loss for Self-supervised Depth Estimation [60.780823530087446]
We show that improvements in image synthesis do not necessitate improvement in depth estimation. We attribute this diverging phenomenon to aleatoric uncertainties, which originate from data. This observed divergence has not been previously reported or studied in depth.
arXiv Detail & Related papers (2021-09-13T17:57:24Z)
Unsupervised Scale-consistent Depth Learning from Video [131.3074342883371]
We propose a monocular depth estimator SC-Depth, which requires only unlabelled videos for training. Thanks to the capability of scale-consistent prediction, we show that our monocular-trained deep networks are readily integrated into the ORB-SLAM2 system. The proposed hybrid Pseudo-RGBD SLAM shows compelling results in KITTI, and it generalizes well to the KAIST dataset without additional training.
arXiv Detail & Related papers (2021-05-25T02:17:56Z)
Organization of a Latent Space structure in VAE/GAN trained by navigation data [0.0]
We present a novel artificial cognitive mapping system using generative deep neural networks (VAE/GAN) We show that the distance of the predicted image is reflected in the distance of the corresponding latent vector after training. The present study allows the network to internally generate temporal sequences analogous to hippocampal replay/pre-play.
arXiv Detail & Related papers (2021-02-03T03:13:26Z)
Representation Learning for Sequence Data with Deep Autoencoding Predictive Components [96.42805872177067]
We propose a self-supervised representation learning method for sequence data, based on the intuition that useful representations of sequence data should exhibit a simple structure in the latent space. We encourage this latent structure by maximizing an estimate of predictive information of latent feature sequences, which is the mutual information between past and future windows at each time step. We demonstrate that our method recovers the latent space of noisy dynamical systems, extracts predictive features for forecasting tasks, and improves automatic speech recognition when used to pretrain the encoder on large amounts of unlabeled data.
arXiv Detail & Related papers (2020-10-07T03:34:01Z)
Consistency Guided Scene Flow Estimation [159.24395181068218]
CGSF is a self-supervised framework for the joint reconstruction of 3D scene structure and motion from stereo video. We show that the proposed model can reliably predict disparity and scene flow in challenging imagery. It achieves better generalization than the state-of-the-art, and adapts quickly and robustly to unseen domains.
arXiv Detail & Related papers (2020-06-19T17:28:07Z)
Interpretation of Deep Temporal Representations by Selective Visualization of Internally Activated Nodes [24.228613156037532]
We propose two new frameworks to visualize temporal representations learned from deep neural networks. Our algorithm interprets the decision of temporal neural network by extracting highly activated periods. We characterize such sub-sequences with clustering and calculate the uncertainty of the suggested type and actual data.
arXiv Detail & Related papers (2020-04-27T01:45:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.