360VFI: A Dataset and Benchmark for Omnidirectional Video Frame Interpolation
- URL: http://arxiv.org/abs/2407.14066v2
- Date: Mon, 22 Jul 2024 13:50:55 GMT
- Title: 360VFI: A Dataset and Benchmark for Omnidirectional Video Frame Interpolation
- Authors: Wenxuan Lu, Mengshun Hu, Yansheng Qiu, Liang Liao, Zheng Wang,
- Abstract summary: We introduce the benchmark dataset, 360VFI, for Omnidirectional Video Frame Interpolation.
We present a practical implementation that introduces a distortion prior from omnidirectional video into the network to modulate distortions.
- Score: 13.122586587748218
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: With the development of VR-related techniques, viewers can enjoy a realistic and immersive experience through a head-mounted display, while omnidirectional video with a low frame rate can lead to user dizziness. However, the prevailing plane frame interpolation methodologies are unsuitable for Omnidirectional Video Interpolation, chiefly due to the lack of models tailored to such videos with strong distortion, compounded by the scarcity of valuable datasets for Omnidirectional Video Frame Interpolation. In this paper, we introduce the benchmark dataset, 360VFI, for Omnidirectional Video Frame Interpolation. We present a practical implementation that introduces a distortion prior from omnidirectional video into the network to modulate distortions. We especially propose a pyramid distortion-sensitive feature extractor that uses the unique characteristics of equirectangular projection (ERP) format as prior information. Moreover, we devise a decoder that uses an affine transformation to facilitate the synthesis of intermediate frames further. 360VFI is the first dataset and benchmark that explores the challenge of Omnidirectional Video Frame Interpolation. Through our benchmark analysis, we presented four different distortion conditions scenes in the proposed 360VFI dataset to evaluate the challenge triggered by distortion during interpolation. Besides, experimental results demonstrate that Omnidirectional Video Interpolation can be effectively improved by modeling for omnidirectional distortion.
Related papers
- Video Dynamics Prior: An Internal Learning Approach for Robust Video
Enhancements [83.5820690348833]
We present a framework for low-level vision tasks that does not require any external training data corpus.
Our approach learns neural modules by optimizing over a corrupted sequence, leveraging the weights of the coherence-temporal test and statistics internal statistics.
arXiv Detail & Related papers (2023-12-13T01:57:11Z) - GenDeF: Learning Generative Deformation Field for Video Generation [89.49567113452396]
We propose to render a video by warping one static image with a generative deformation field (GenDeF)
Such a pipeline enjoys three appealing advantages.
arXiv Detail & Related papers (2023-12-07T18:59:41Z) - Three-Stage Cascade Framework for Blurry Video Frame Interpolation [23.38547327916875]
Blurry video frame (BVFI) aims to generate high-frame-rate clear videos from low-frame-rate blurry videos.
BVFI methods usually fail to fully leverage all valuable information, which ultimately hinders their performance.
We propose a simple end-to-end three-stage framework to fully explore useful information from blurry videos.
arXiv Detail & Related papers (2023-10-09T03:37:30Z) - Spherical Vision Transformer for 360-degree Video Saliency Prediction [17.948179628551376]
We propose a vision-transformer-based model for omnidirectional videos named SalViT360.
We introduce a spherical geometry-aware self-attention mechanism that is capable of effective omnidirectional video understanding.
Our approach is the first to employ tangent images for omnidirectional saliency prediction prediction, and our experimental results on three ODV saliency datasets demonstrate its effectiveness compared to the state-of-the-art.
arXiv Detail & Related papers (2023-08-24T18:07:37Z) - Event-Based Frame Interpolation with Ad-hoc Deblurring [68.97825675372354]
We propose a general method for event-based frame that performs deblurring ad-hoc on input videos.
Our network consistently outperforms state-of-the-art methods on frame, single image deblurring and the joint task of deblurring.
Our code and dataset will be made publicly available.
arXiv Detail & Related papers (2023-01-12T18:19:00Z) - Panoramic Vision Transformer for Saliency Detection in 360{\deg} Videos [48.54829780502176]
We present a new framework named Panoramic Vision Transformer (PAVER)
We design the encoder using Vision Transformer with deformable convolution, which enables us to plug pretrained models from normal videos into our architecture without additional modules or finetuning.
We demonstrate the utility of our saliency prediction model with the omnidirectional video quality assessment task in VQA-ODV, where we consistently improve performance without any form of supervision.
arXiv Detail & Related papers (2022-09-19T12:23:34Z) - Learning Omnidirectional Flow in 360-degree Video via Siamese
Representation [11.421244426346389]
This paper proposes the first perceptually natural-synthetic omnidirectional benchmark dataset with a 360-degree field of view, FLOW360.
We present a novel Siamese representation Learning framework for Omnidirectional Flow (SLOF)
Experiments verify the proposed framework's effectiveness and show up to 40% performance improvement over the state-of-the-art approaches.
arXiv Detail & Related papers (2022-08-07T02:24:30Z) - Learned Video Compression via Heterogeneous Deformable Compensation
Network [78.72508633457392]
We propose a learned video compression framework via heterogeneous deformable compensation strategy (HDCVC) to tackle the problems of unstable compression performance.
More specifically, the proposed algorithm extracts features from the two adjacent frames to estimate content-Neighborhood heterogeneous deformable (HetDeform) kernel offsets.
Experimental results indicate that HDCVC achieves superior performance than the recent state-of-the-art learned video compression approaches.
arXiv Detail & Related papers (2022-07-11T02:31:31Z) - Deformable Video Transformer [44.71254375663616]
We introduce the Deformable Video Transformer (DVT), which predicts a small subset of video patches to attend for each query location based on motion information.
Our model achieves higher accuracy at the same or lower computational cost, and it attains state-of-the-art results on four datasets.
arXiv Detail & Related papers (2022-03-31T04:52:27Z) - FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation [97.99012124785177]
FLAVR is a flexible and efficient architecture that uses 3D space-time convolutions to enable end-to-end learning and inference for video framesupervised.
We demonstrate that FLAVR can serve as a useful self- pretext task for action recognition, optical flow estimation, and motion magnification.
arXiv Detail & Related papers (2020-12-15T18:59:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.