A Single Frame and Multi-Frame Joint Network for 360-degree Panorama
Video Super-Resolution
- URL: http://arxiv.org/abs/2008.10320v1
- Date: Mon, 24 Aug 2020 11:09:54 GMT
- Title: A Single Frame and Multi-Frame Joint Network for 360-degree Panorama
Video Super-Resolution
- Authors: Hongying Liu, Zhubo Ruan, Chaowei Fang, Peng Zhao, Fanhua Shang,
Yuanyuan Liu, Lijun Wang
- Abstract summary: Spherical videos, also known as ang360 (panorama) videos, can be viewed with various virtual reality devices such as computers and head-mounted displays.
We propose a novel single frame and multi-frame joint network (SMFN) for recovering high-resolution spherical videos from low-resolution inputs.
- Score: 34.35942412092329
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spherical videos, also known as \ang{360} (panorama) videos, can be viewed
with various virtual reality devices such as computers and head-mounted
displays. They attract large amount of interest since awesome immersion can be
experienced when watching spherical videos. However, capturing, storing and
transmitting high-resolution spherical videos are extremely expensive. In this
paper, we propose a novel single frame and multi-frame joint network (SMFN) for
recovering high-resolution spherical videos from low-resolution inputs. To take
advantage of pixel-level inter-frame consistency, deformable convolutions are
used to eliminate the motion difference between feature maps of the target
frame and its neighboring frames. A mixed attention mechanism is devised to
enhance the feature representation capability. The dual learning strategy is
exerted to constrain the space of solution so that a better solution can be
found. A novel loss function based on the weighted mean square error is
proposed to emphasize on the super-resolution of the equatorial regions. This
is the first attempt to settle the super-resolution of spherical videos, and we
collect a novel dataset from the Internet, MiG Panorama Video, which includes
204 videos. Experimental results on 4 representative video clips demonstrate
the efficacy of the proposed method. The dataset and code are available at
https://github.com/lovepiano/SMFN_For_360VSR.
Related papers
- Self-Adaptive Sampling for Efficient Video Question-Answering on Image--Text Models [41.12711820047315]
Video understanding models usually randomly sample a set of frames or clips, regardless of internal correlations between their visual contents, nor their relevance to the problem.
We propose two frame sampling strategies, namely the most domain frames (MDF) and most implied frames (MIF), to maximally preserve those frames that are most likely vital to the given questions.
arXiv Detail & Related papers (2023-07-09T14:54:30Z) - MagicVideo: Efficient Video Generation With Latent Diffusion Models [76.95903791630624]
We present an efficient text-to-video generation framework based on latent diffusion models, termed MagicVideo.
Due to a novel and efficient 3D U-Net design and modeling video distributions in a low-dimensional space, MagicVideo can synthesize video clips with 256x256 spatial resolution on a single GPU card.
We conduct extensive experiments and demonstrate that MagicVideo can generate high-quality video clips with either realistic or imaginary content.
arXiv Detail & Related papers (2022-11-20T16:40:31Z) - VideoINR: Learning Video Implicit Neural Representation for Continuous
Space-Time Super-Resolution [75.79379734567604]
We show that Video Implicit Neural Representation (VideoINR) can be decoded to videos of arbitrary spatial resolution and frame rate.
We show that VideoINR achieves competitive performances with state-of-the-art STVSR methods on common up-sampling scales.
arXiv Detail & Related papers (2022-06-09T17:45:49Z) - Learning Trajectory-Aware Transformer for Video Super-Resolution [50.49396123016185]
Video super-resolution aims to restore a sequence of high-resolution (HR) frames from their low-resolution (LR) counterparts.
Existing approaches usually align and aggregate video frames from limited adjacent frames.
We propose a novel Transformer for Video Super-Resolution (TTVSR)
arXiv Detail & Related papers (2022-04-08T03:37:39Z) - Memory-Augmented Non-Local Attention for Video Super-Resolution [61.55700315062226]
We propose a novel video super-resolution method that aims at generating high-fidelity high-resolution (HR) videos from low-resolution (LR) ones.
Previous methods predominantly leverage temporal neighbor frames to assist the super-resolution of the current frame.
In contrast, we devise a cross-frame non-local attention mechanism that allows video super-resolution without frame alignment.
arXiv Detail & Related papers (2021-08-25T05:12:14Z) - One Ring to Rule Them All: a simple solution to multi-view
3D-Reconstruction of shapes with unknown BRDF via a small Recurrent ResNet [96.11203962525443]
This paper proposes a simple method which solves an open problem of multi-view 3D-Review for objects with unknown surface materials.
The object can have arbitrary (e.g. non-Lambertian), spatially-varying (or everywhere different) surface reflectances (svBRDF)
Our solution consists of novel-view-synthesis, relighting, material relighting, and shape exchange without additional coding effort.
arXiv Detail & Related papers (2021-04-11T13:39:31Z) - Video Deblurring by Fitting to Test Data [39.41334067434719]
Motion blur in videos captured by autonomous vehicles and robots can degrade their perception capability.
We present a novel approach to video deblurring by fitting a deep network to the test video.
Our approach selects sharp frames from a video and then trains a convolutional neural network on these sharp frames.
arXiv Detail & Related papers (2020-12-09T18:49:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.