Implicit View-Time Interpolation of Stereo Videos using Multi-Plane
Disparities and Non-Uniform Coordinates
- URL: http://arxiv.org/abs/2303.17181v1
- Date: Thu, 30 Mar 2023 06:32:55 GMT
- Title: Implicit View-Time Interpolation of Stereo Videos using Multi-Plane
Disparities and Non-Uniform Coordinates
- Authors: Avinash Paliwal, Andrii Tsarov and Nima Khademi Kalantari
- Abstract summary: We build upon X-Fields that approximates an interpolatable mapping between the input coordinates and 2D RGB images.
We propose multi-plane disparities to reduce the spatial distance of the objects in the stereo views.
We additionally introduce several simple, but important, improvements over X-Fields.
- Score: 10.445563506186307
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose an approach for view-time interpolation of stereo
videos. Specifically, we build upon X-Fields that approximates an
interpolatable mapping between the input coordinates and 2D RGB images using a
convolutional decoder. Our main contribution is to analyze and identify the
sources of the problems with using X-Fields in our application and propose
novel techniques to overcome these challenges. Specifically, we observe that
X-Fields struggles to implicitly interpolate the disparities for large baseline
cameras. Therefore, we propose multi-plane disparities to reduce the spatial
distance of the objects in the stereo views. Moreover, we propose non-uniform
time coordinates to handle the non-linear and sudden motion spikes in videos.
We additionally introduce several simple, but important, improvements over
X-Fields. We demonstrate that our approach is able to produce better results
than the state of the art, while running in near real-time rates and having low
memory and storage costs.
Related papers
- Mumpy: Multilateral Temporal-view Pyramid Transformer for Video Inpainting Detection [41.4800103693756]
We introduce a novel Multilateral Temporal-view Pyramid Transformer (em MumPy) that collaborates spatial-temporal clues flexibly.
Our method utilizes a newly designed multilateral temporal-view to extract various collaborations of spatial-temporal clues and introduces a deformable window-based temporal-view interaction module.
By adjusting the contribution strength of spatial and temporal clues, our method can effectively identify inpainted regions.
arXiv Detail & Related papers (2024-04-17T03:56:28Z) - OmniLocalRF: Omnidirectional Local Radiance Fields from Dynamic Videos [14.965321452764355]
We introduce a new approach called Omnidirectional Local Radiance Fields (OmniLocalRF) that can render static-only scene views.
Our approach combines the principles of local radiance fields with the bidirectional optimization of omnidirectional rays.
Our experiments validate that OmniLocalRF outperforms existing methods in both qualitative and quantitative metrics.
arXiv Detail & Related papers (2024-03-31T12:55:05Z) - Multi-Spectral Image Stitching via Spatial Graph Reasoning [52.27796682972484]
We propose a spatial graph reasoning based multi-spectral image stitching method.
We embed multi-scale complementary features from the same view position into a set of nodes.
By introducing long-range coherence along spatial and channel dimensions, the complementarity of pixel relations and channel interdependencies aids in the reconstruction of aligned multi-view features.
arXiv Detail & Related papers (2023-07-31T15:04:52Z) - Video Frame Interpolation with Stereo Event and Intensity Camera [40.07341828127157]
We propose a novel Stereo Event-based VFI network (SE-VFI-Net) to generate high-quality intermediate frames.
We exploit the fused features accomplishing accurate optical flow and disparity estimation.
Our proposed SEVFI-Net outperforms state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2023-07-17T04:02:00Z) - Multiscale Representation for Real-Time Anti-Aliasing Neural Rendering [84.37776381343662]
Mip-NeRF proposes a multiscale representation as a conical frustum to encode scale information.
We propose mip voxel grids (Mip-VoG), an explicit multiscale representation for real-time anti-aliasing rendering.
Our approach is the first to offer multiscale training and real-time anti-aliasing rendering simultaneously.
arXiv Detail & Related papers (2023-04-20T04:05:22Z) - Adaptive Human Matting for Dynamic Videos [62.026375402656754]
Adaptive Matting for Dynamic Videos, termed AdaM, is a framework for simultaneously differentiating foregrounds from backgrounds.
Two interconnected network designs are employed to achieve this goal.
We benchmark and study our methods recently introduced datasets, showing that our matting achieves new best-in-class generalizability.
arXiv Detail & Related papers (2023-04-12T17:55:59Z) - Towards Nonlinear-Motion-Aware and Occlusion-Robust Rolling Shutter
Correction [54.00007868515432]
Existing methods face challenges in estimating the accurate correction field due to the uniform velocity assumption.
We propose a geometry-based Quadratic Rolling Shutter (QRS) motion solver, which precisely estimates the high-order correction field of individual pixels.
Our method surpasses the state-of-the-art by +4.98, +0.77, and +4.33 of PSNR on Carla-RS, Fastec-RS, and BS-RSC datasets, respectively.
arXiv Detail & Related papers (2023-03-31T15:09:18Z) - Event-Based Frame Interpolation with Ad-hoc Deblurring [68.97825675372354]
We propose a general method for event-based frame that performs deblurring ad-hoc on input videos.
Our network consistently outperforms state-of-the-art methods on frame, single image deblurring and the joint task of deblurring.
Our code and dataset will be made publicly available.
arXiv Detail & Related papers (2023-01-12T18:19:00Z) - Video Shadow Detection via Spatio-Temporal Interpolation Consistency
Training [31.115226660100294]
We propose a framework to feed the unlabeled video frames together with the labeled images into an image shadow detection network training.
We then derive the spatial and temporal consistency constraints accordingly for enhancing generalization in the pixel-wise classification.
In addition, we design a Scale-Aware Network for multi-scale shadow knowledge learning in images.
arXiv Detail & Related papers (2022-06-17T14:29:51Z) - Video Frame Interpolation Transformer [86.20646863821908]
We propose a Transformer-based video framework that allows content-aware aggregation weights and considers long-range dependencies with the self-attention operations.
To avoid the high computational cost of global self-attention, we introduce the concept of local attention into video.
In addition, we develop a multi-scale frame scheme to fully realize the potential of Transformers.
arXiv Detail & Related papers (2021-11-27T05:35:10Z) - Heuristics2Annotate: Efficient Annotation of Large-Scale Marathon
Dataset For Bounding Box Regression [8.078491757252692]
We collect a novel large-scale in-the-wild video dataset of marathon runners.
The dataset consists of hours of recording of thousands of runners captured using 42 hand-held smartphone cameras.
We propose a new scheme for tackling the challenges in the annotation of such large dataset.
arXiv Detail & Related papers (2021-04-06T19:08:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.