Omnidirectional Video Super-Resolution using Deep Learning
- URL: http://arxiv.org/abs/2506.14803v1
- Date: Tue, 03 Jun 2025 05:59:21 GMT
- Title: Omnidirectional Video Super-Resolution using Deep Learning
- Authors: Arbind Agrahari Baniya, Tsz-Kwan Lee, Peter W. Eklund, Sunil Aryal,
- Abstract summary: The limited spatial resolution in 360deg videos does not allow for each degree of view to be represented with adequate pixels.<n>This paper proposes a novel deep learning model for 360deg Video Super-Resolution (360deg VSR) called Spherical Signal Super-resolution with a Proportioned optimisation (S3PO)<n>S3PO adopts recurrent modelling with an attention mechanism, unbound from conventional VSR techniques like alignment.
- Score: 3.281128493853064
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Omnidirectional Videos (or 360{\deg} videos) are widely used in Virtual Reality (VR) to facilitate immersive and interactive viewing experiences. However, the limited spatial resolution in 360{\deg} videos does not allow for each degree of view to be represented with adequate pixels, limiting the visual quality offered in the immersive experience. Deep learning Video Super-Resolution (VSR) techniques used for conventional videos could provide a promising software-based solution; however, these techniques do not tackle the distortion present in equirectangular projections of 360{\deg} video signals. An additional obstacle is the limited availability of 360{\deg} video datasets for study. To address these issues, this paper creates a novel 360{\deg} Video Dataset (360VDS) with a study of the extensibility of conventional VSR models to 360{\deg} videos. This paper further proposes a novel deep learning model for 360{\deg} Video Super-Resolution (360{\deg} VSR), called Spherical Signal Super-resolution with a Proportioned Optimisation (S3PO). S3PO adopts recurrent modelling with an attention mechanism, unbound from conventional VSR techniques like alignment. With a purpose-built feature extractor and a novel loss function addressing spherical distortion, S3PO outperforms most state-of-the-art conventional VSR models and 360{\deg}~specific super-resolution models on 360{\deg} video datasets. A step-wise ablation study is presented to understand and demonstrate the impact of the chosen architectural sub-components, targeted training and optimisation.
Related papers
- VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction [86.82819259860186]
We introduce VLM-3R, a unified framework for Vision-Language Models (VLMs) that incorporates 3D Reconstructive instruction tuning.<n>VLM-3R processes monocular video frames by employing a geometry encoder to derive implicit 3D tokens that represent spatial understanding.
arXiv Detail & Related papers (2025-05-26T17:56:30Z) - Beyond the Frame: Generating 360° Panoramic Videos from Perspective Videos [64.10180665546237]
360deg videos offer a more complete perspective of our surroundings.<n>Existing video models excel at producing standard videos, but their ability to generate full panoramic videos remains elusive.<n>We develop a high-quality data filtering pipeline to curate pairwise training data and improve the quality of 360deg video generation.<n> Experimental results demonstrate that our model can generate realistic and coherent 360deg videos from in-the-wild perspective video.
arXiv Detail & Related papers (2025-04-10T17:51:38Z) - MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views [90.26609689682876]
We introduce MVSplat360, a feed-forward approach for 360deg novel view synthesis (NVS) of diverse real-world scenes, using only sparse observations.
This setting is inherently ill-posed due to minimal overlap among input views and insufficient visual information provided.
Our model is end-to-end trainable and supports rendering arbitrary views with as few as 5 sparse input views.
arXiv Detail & Related papers (2024-11-07T17:59:31Z) - 360VFI: A Dataset and Benchmark for Omnidirectional Video Frame Interpolation [13.122586587748218]
This paper introduces the benchmark dataset, 360VFI, for Omnidirectional Video Frame Interpolation.
We present a practical implementation that introduces a distortion prior from omnidirectional video into the network to modulate distortions.
arXiv Detail & Related papers (2024-07-19T06:50:24Z) - Heterogeneous 360 Degree Videos in Metaverse: Differentiated
Reinforcement Learning Approaches [10.0580903923777]
This paper presents a novel Quality of Service model for heterogeneous 360-degree videos with different requirements for frame rates and cybersickness.
We propose a frame-slotted structure and conduct frame-wise optimization using self-designed differentiated deep reinforcement learning algorithms.
arXiv Detail & Related papers (2023-08-08T06:47:16Z) - OPDN: Omnidirectional Position-aware Deformable Network for
Omnidirectional Image Super-Resolution [18.138867445188293]
We propose a two-stage framework for 360deg omnidirectional image superresolution.
Our proposed method achieves superior performance and wins the NTIRE 2023 challenge of 360deg omnidirectional image super-resolution.
arXiv Detail & Related papers (2023-04-26T11:47:40Z) - Panoramic Vision Transformer for Saliency Detection in 360{\deg} Videos [48.54829780502176]
We present a new framework named Panoramic Vision Transformer (PAVER)
We design the encoder using Vision Transformer with deformable convolution, which enables us to plug pretrained models from normal videos into our architecture without additional modules or finetuning.
We demonstrate the utility of our saliency prediction model with the omnidirectional video quality assessment task in VQA-ODV, where we consistently improve performance without any form of supervision.
arXiv Detail & Related papers (2022-09-19T12:23:34Z) - Blind VQA on 360{\deg} Video via Progressively Learning from Pixels,
Frames and Video [66.57045901742922]
Blind visual quality assessment (BVQA) on 360textdegree video plays a key role in optimizing immersive multimedia systems.
In this paper, we take into account the progressive paradigm of human perception towards spherical video quality.
We propose a novel BVQA approach (namely ProVQA) for 360textdegree video via progressively learning from pixels, frames and video.
arXiv Detail & Related papers (2021-11-18T03:45:13Z) - 3D Human Pose, Shape and Texture from Low-Resolution Images and Videos [107.36352212367179]
We propose RSC-Net, which consists of a Resolution-aware network, a Self-supervision loss, and a Contrastive learning scheme.
The proposed method is able to learn 3D body pose and shape across different resolutions with one single model.
We extend the RSC-Net to handle low-resolution videos and apply it to reconstruct textured 3D pedestrians from low-resolution input.
arXiv Detail & Related papers (2021-03-11T06:52:12Z) - ATSal: An Attention Based Architecture for Saliency Prediction in 360
Videos [5.831115928056554]
This paper proposes ATSal, a novel attention based (head-eye) saliency model for 360degree videos.
We compare the proposed approach to other state-of-the-art saliency models on two datasets: Salient360! and VR-EyeTracking.
Experimental results on over 80 ODV videos (75K+ frames) show that the proposed method outperforms the existing state-of-the-art.
arXiv Detail & Related papers (2020-11-20T19:19:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.