Related papers: 360VFI: A Dataset and Benchmark for Omnidirectional Video Frame Interpolation

360VFI: A Dataset and Benchmark for Omnidirectional Video Frame Interpolation

URL: http://arxiv.org/abs/2407.14066v2
Date: Mon, 22 Jul 2024 13:50:55 GMT
Title: 360VFI: A Dataset and Benchmark for Omnidirectional Video Frame Interpolation
Authors: Wenxuan Lu, Mengshun Hu, Yansheng Qiu, Liang Liao, Zheng Wang,
Abstract summary: We introduce the benchmark dataset, 360VFI, for Omnidirectional Video Frame Interpolation. We present a practical implementation that introduces a distortion prior from omnidirectional video into the network to modulate distortions.
Score: 13.122586587748218
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: With the development of VR-related techniques, viewers can enjoy a realistic and immersive experience through a head-mounted display, while omnidirectional video with a low frame rate can lead to user dizziness. However, the prevailing plane frame interpolation methodologies are unsuitable for Omnidirectional Video Interpolation, chiefly due to the lack of models tailored to such videos with strong distortion, compounded by the scarcity of valuable datasets for Omnidirectional Video Frame Interpolation. In this paper, we introduce the benchmark dataset, 360VFI, for Omnidirectional Video Frame Interpolation. We present a practical implementation that introduces a distortion prior from omnidirectional video into the network to modulate distortions. We especially propose a pyramid distortion-sensitive feature extractor that uses the unique characteristics of equirectangular projection (ERP) format as prior information. Moreover, we devise a decoder that uses an affine transformation to facilitate the synthesis of intermediate frames further. 360VFI is the first dataset and benchmark that explores the challenge of Omnidirectional Video Frame Interpolation. Through our benchmark analysis, we presented four different distortion conditions scenes in the proposed 360VFI dataset to evaluate the challenge triggered by distortion during interpolation. Besides, experimental results demonstrate that Omnidirectional Video Interpolation can be effectively improved by modeling for omnidirectional distortion.

Related papers

In-2-4D: Inbetweening from Two Single-View Images to 4D Generation [54.62824686338408]
We propose a new problem, In-between2-4D, for generative 4D (i.e., 3D + motion) in Splating from a minimalistic input setting. Given two images representing the start and end states of an object in motion, our goal is to generate and reconstruct the motion in 4D.
arXiv Detail & Related papers (2025-04-11T09:01:09Z)
SVG: 3D Stereoscopic Video Generation via Denoising Frame Matrix [60.48666051245761]
We propose a pose-free and training-free approach for generating 3D stereoscopic videos. Our method warps a generated monocular video into camera views on stereoscopic baseline using estimated video depth. We develop a disocclusion boundary re-injection scheme that further improves the quality of video inpainting.
arXiv Detail & Related papers (2024-06-29T08:33:55Z)
GenDeF: Learning Generative Deformation Field for Video Generation [89.49567113452396]
We propose to render a video by warping one static image with a generative deformation field (GenDeF) Such a pipeline enjoys three appealing advantages.
arXiv Detail & Related papers (2023-12-07T18:59:41Z)
Three-Stage Cascade Framework for Blurry Video Frame Interpolation [23.38547327916875]
Blurry video frame (BVFI) aims to generate high-frame-rate clear videos from low-frame-rate blurry videos. BVFI methods usually fail to fully leverage all valuable information, which ultimately hinders their performance. We propose a simple end-to-end three-stage framework to fully explore useful information from blurry videos.
arXiv Detail & Related papers (2023-10-09T03:37:30Z)
Spherical Vision Transformer for 360-degree Video Saliency Prediction [17.948179628551376]
We propose a vision-transformer-based model for omnidirectional videos named SalViT360. We introduce a spherical geometry-aware self-attention mechanism that is capable of effective omnidirectional video understanding. Our approach is the first to employ tangent images for omnidirectional saliency prediction prediction, and our experimental results on three ODV saliency datasets demonstrate its effectiveness compared to the state-of-the-art.
arXiv Detail & Related papers (2023-08-24T18:07:37Z)
Panoramic Vision Transformer for Saliency Detection in 360{\deg} Videos [48.54829780502176]
We present a new framework named Panoramic Vision Transformer (PAVER) We design the encoder using Vision Transformer with deformable convolution, which enables us to plug pretrained models from normal videos into our architecture without additional modules or finetuning. We demonstrate the utility of our saliency prediction model with the omnidirectional video quality assessment task in VQA-ODV, where we consistently improve performance without any form of supervision.
arXiv Detail & Related papers (2022-09-19T12:23:34Z)
Learning Omnidirectional Flow in 360-degree Video via Siamese Representation [11.421244426346389]
This paper proposes the first perceptually natural-synthetic omnidirectional benchmark dataset with a 360-degree field of view, FLOW360. We present a novel Siamese representation Learning framework for Omnidirectional Flow (SLOF) Experiments verify the proposed framework's effectiveness and show up to 40% performance improvement over the state-of-the-art approaches.
arXiv Detail & Related papers (2022-08-07T02:24:30Z)
Deformable Video Transformer [44.71254375663616]
We introduce the Deformable Video Transformer (DVT), which predicts a small subset of video patches to attend for each query location based on motion information. Our model achieves higher accuracy at the same or lower computational cost, and it attains state-of-the-art results on four datasets.
arXiv Detail & Related papers (2022-03-31T04:52:27Z)
StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pretrained StyleGAN [49.917296433657484]
One-shot talking face generation aims at synthesizing a high-quality talking face video from an arbitrary portrait image. In this work, we investigate the latent feature space of a pre-trained StyleGAN and discover some excellent spatial transformation properties. We propose a novel unified framework based on a pre-trained StyleGAN that enables a set of powerful functionalities.
arXiv Detail & Related papers (2022-03-08T12:06:12Z)
A Video Is Worth Three Views: Trigeminal Transformers for Video-based Person Re-identification [77.08204941207985]
Video-based person re-identification (Re-ID) aims to retrieve video sequences of the same person under non-overlapping cameras. We propose a novel framework named Trigeminal Transformers (TMT) for video-based person Re-ID.
arXiv Detail & Related papers (2021-04-05T02:50:16Z)
FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation [97.99012124785177]
FLAVR is a flexible and efficient architecture that uses 3D space-time convolutions to enable end-to-end learning and inference for video framesupervised. We demonstrate that FLAVR can serve as a useful self- pretext task for action recognition, optical flow estimation, and motion magnification.
arXiv Detail & Related papers (2020-12-15T18:59:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.