Revisiting Optical Flow Estimation in 360 Videos
- URL: http://arxiv.org/abs/2010.08045v1
- Date: Thu, 15 Oct 2020 22:22:21 GMT
- Title: Revisiting Optical Flow Estimation in 360 Videos
- Authors: Keshav Bhandari, Ziliang Zong, Yan Yan
- Abstract summary: We design LiteFlowNet360 as a domain adaptation framework from perspective video domain to 360 video domain.
We adapt it from simple kernel transformation techniques inspired by Kernel Transformer Network (KTN) to cope with inherent distortion in 360 videos.
Experimental results show the promising results of 360 video optical flow estimation using the proposed novel architecture.
- Score: 9.997208301312956
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Nowadays 360 video analysis has become a significant research topic in the
field since the appearance of high-quality and low-cost 360 wearable devices.
In this paper, we propose a novel LiteFlowNet360 architecture for 360 videos
optical flow estimation. We design LiteFlowNet360 as a domain adaptation
framework from perspective video domain to 360 video domain. We adapt it from
simple kernel transformation techniques inspired by Kernel Transformer Network
(KTN) to cope with inherent distortion in 360 videos caused by the
sphere-to-plane projection. First, we apply an incremental transformation of
convolution layers in feature pyramid network and show that further
transformation in inference and regularization layers are not important, hence
reducing the network growth in terms of size and computation cost. Second, we
refine the network by training with augmented data in a supervised manner. We
perform data augmentation by projecting the images in a sphere and
re-projecting to a plane. Third, we train LiteFlowNet360 in a self-supervised
manner using target domain 360 videos. Experimental results show the promising
results of 360 video optical flow estimation using the proposed novel
architecture.
Related papers
- 360VFI: A Dataset and Benchmark for Omnidirectional Video Frame Interpolation [13.122586587748218]
This paper introduces the benchmark dataset, 360VFI, for Omnidirectional Video Frame Interpolation.
We present a practical implementation that introduces a distortion prior from omnidirectional video into the network to modulate distortions.
arXiv Detail & Related papers (2024-07-19T06:50:24Z) - Sp2360: Sparse-view 360 Scene Reconstruction using Cascaded 2D Diffusion Priors [51.36238367193988]
We tackle sparse-view reconstruction of a 360 3D scene using priors from latent diffusion models (LDM)
We present SparseSplat360, a method that employs a cascade of in-painting and artifact removal models to fill in missing details and clean novel views.
Our method generates entire 360 scenes from as few as 9 input views, with a high degree of foreground and background detail.
arXiv Detail & Related papers (2024-05-26T11:01:39Z) - Spherical Vision Transformer for 360-degree Video Saliency Prediction [17.948179628551376]
We propose a vision-transformer-based model for omnidirectional videos named SalViT360.
We introduce a spherical geometry-aware self-attention mechanism that is capable of effective omnidirectional video understanding.
Our approach is the first to employ tangent images for omnidirectional saliency prediction prediction, and our experimental results on three ODV saliency datasets demonstrate its effectiveness compared to the state-of-the-art.
arXiv Detail & Related papers (2023-08-24T18:07:37Z) - Optical Flow Estimation in 360$^\circ$ Videos: Dataset, Model and
Application [9.99133340779672]
We propose the first perceptually realistic 360$circ$ filed-of-view video benchmark dataset, namely FLOW360.
We present a novel Siamese representation Learning framework for Omnidirectional Flow (SLOF) estimation, which is trained in a contrastive manner.
The learning scheme is further proven to be efficient by expanding our siamese learning scheme and omnidirectional optical flow estimation to the egocentric activity recognition task.
arXiv Detail & Related papers (2023-01-27T17:50:09Z) - Panoramic Vision Transformer for Saliency Detection in 360{\deg} Videos [48.54829780502176]
We present a new framework named Panoramic Vision Transformer (PAVER)
We design the encoder using Vision Transformer with deformable convolution, which enables us to plug pretrained models from normal videos into our architecture without additional modules or finetuning.
We demonstrate the utility of our saliency prediction model with the omnidirectional video quality assessment task in VQA-ODV, where we consistently improve performance without any form of supervision.
arXiv Detail & Related papers (2022-09-19T12:23:34Z) - Deep 360$^\circ$ Optical Flow Estimation Based on Multi-Projection
Fusion [10.603670927163002]
This paper focuses on the 360$circ$ optical flow estimation using deep neural networks to support increasingly popular VR applications.
We propose a novel multi-projection fusion framework that fuses the optical flow predicted by the models trained using different projection methods.
We also build the first large-scale panoramic optical flow dataset to support the training of neural networks and the evaluation of panoramic optical flow estimation methods.
arXiv Detail & Related papers (2022-07-27T16:48:32Z) - Distortion-Aware Loop Filtering of Intra 360^o Video Coding with
Equirectangular Projection [81.63407194858854]
We propose a distortion-aware loop filtering model to improve the performance of intra coding for 360$o$ videos projected via equirectangular projection (ERP) format.
Our proposed module analyzes content characteristics based on a coding unit (CU) partition mask and processes them through partial convolution to activate the specified area.
arXiv Detail & Related papers (2022-02-20T12:00:18Z) - Blind VQA on 360{\deg} Video via Progressively Learning from Pixels,
Frames and Video [66.57045901742922]
Blind visual quality assessment (BVQA) on 360textdegree video plays a key role in optimizing immersive multimedia systems.
In this paper, we take into account the progressive paradigm of human perception towards spherical video quality.
We propose a novel BVQA approach (namely ProVQA) for 360textdegree video via progressively learning from pixels, frames and video.
arXiv Detail & Related papers (2021-11-18T03:45:13Z) - Unsupervised Depth Completion with Calibrated Backprojection Layers [79.35651668390496]
We propose a deep neural network architecture to infer dense depth from an image and a sparse point cloud.
It is trained using a video stream and corresponding synchronized sparse point cloud, as obtained from a LIDAR or other range sensor, along with the intrinsic calibration parameters of the camera.
At inference time, the calibration of the camera, which can be different from the one used for training, is fed as an input to the network along with the sparse point cloud and a single image.
arXiv Detail & Related papers (2021-08-24T05:41:59Z) - Visual Question Answering on 360{\deg} Images [96.00046925811515]
VQA 360 is a novel task of visual question answering on 360 images.
We collect the first VQA 360 dataset, containing around 17,000 real-world image-question-answer triplets for a variety of question types.
arXiv Detail & Related papers (2020-01-10T08:18:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.