Spherical Convolution empowered FoV Prediction in 360-degree Video
Multicast with Limited FoV Feedback
- URL: http://arxiv.org/abs/2201.12525v1
- Date: Sat, 29 Jan 2022 08:32:19 GMT
- Title: Spherical Convolution empowered FoV Prediction in 360-degree Video
Multicast with Limited FoV Feedback
- Authors: Jie Li, Ling Han, Cong Zhang, Qiyue Li, Zhi Liu
- Abstract summary: Field of view (FoV) prediction is critical in 360-degree video multicast.
This paper proposes a spherical convolution-empowered FoV prediction method.
The experimental results show that the performance of the proposed method is better than other prediction methods.
- Score: 16.716422953229088
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Field of view (FoV) prediction is critical in 360-degree video multicast,
which is a key component of the emerging Virtual Reality (VR) and Augmented
Reality (AR) applications. Most of the current prediction methods combining
saliency detection and FoV information neither take into account that the
distortion of projected 360-degree videos can invalidate the weight sharing of
traditional convolutional networks, nor do they adequately consider the
difficulty of obtaining complete multi-user FoV information, which degrades the
prediction performance. This paper proposes a spherical convolution-empowered
FoV prediction method, which is a multi-source prediction framework combining
salient features extracted from 360-degree video with limited FoV feedback
information. A spherical convolution neural network (CNN) is used instead of a
traditional two-dimensional CNN to eliminate the problem of weight sharing
failure caused by video projection distortion. Specifically, salient
spatial-temporal features are extracted through a spherical convolution-based
saliency detection model, after which the limited feedback FoV information is
represented as a time-series model based on a spherical convolution-empowered
gated recurrent unit network. Finally, the extracted salient video features are
combined to predict future user FoVs. The experimental results show that the
performance of the proposed method is better than other prediction methods.
Related papers
- Spatial Visibility and Temporal Dynamics: Revolutionizing Field of View Prediction in Adaptive Point Cloud Video Streaming [19.0599625095738]
Field-of-View adaptive streaming significantly reduces bandwidth requirement of immersive point cloud video.
Traditional approaches often focus on trajectory-based 6 degree-of-freedom (6DoF) FoV predictions.
We reformulate the PCV FoV prediction problem from the cell visibility perspective.
arXiv Detail & Related papers (2024-09-26T19:27:11Z) - MADRL-Based Rate Adaptation for 360° Video Streaming with Multi-Viewpoint Prediction [3.8611070161950916]
A key challenge of 360deg video playback is ensuring a high quality of experience (QoE) with limited network bandwidth.
Currently, most studies focus on tile-based adaptive (ABR) streaming based on single viewport prediction to reduce bandwidth consumption.
This paper first presents a multimodal spatial-temporal attention transformer to generate multiple viewpoint trajectories with their probabilities given a historical trajectory.
After that, a multi-agent deep reinforcement learning (MADRL)-based ABR algorithm utilizing multi-viewpoint prediction for 360deg video streaming is proposed.
arXiv Detail & Related papers (2024-05-13T13:59:59Z) - Diffusion-Based Particle-DETR for BEV Perception [94.88305708174796]
Bird-Eye-View (BEV) is one of the most widely-used scene representations for visual perception in Autonomous Vehicles (AVs)
Recent diffusion-based methods offer a promising approach to uncertainty modeling for visual perception but fail to effectively detect small objects in the large coverage of the BEV.
Here, we address this problem by combining the diffusion paradigm with current state-of-the-art 3D object detectors in BEV.
arXiv Detail & Related papers (2023-12-18T09:52:14Z) - Spherical Vision Transformer for 360-degree Video Saliency Prediction [17.948179628551376]
We propose a vision-transformer-based model for omnidirectional videos named SalViT360.
We introduce a spherical geometry-aware self-attention mechanism that is capable of effective omnidirectional video understanding.
Our approach is the first to employ tangent images for omnidirectional saliency prediction prediction, and our experimental results on three ODV saliency datasets demonstrate its effectiveness compared to the state-of-the-art.
arXiv Detail & Related papers (2023-08-24T18:07:37Z) - Deceptive-NeRF/3DGS: Diffusion-Generated Pseudo-Observations for High-Quality Sparse-View Reconstruction [60.52716381465063]
We introduce Deceptive-NeRF/3DGS to enhance sparse-view reconstruction with only a limited set of input images.
Specifically, we propose a deceptive diffusion model turning noisy images rendered from few-view reconstructions into high-quality pseudo-observations.
Our system progressively incorporates diffusion-generated pseudo-observations into the training image sets, ultimately densifying the sparse input observations by 5 to 10 times.
arXiv Detail & Related papers (2023-05-24T14:00:32Z) - FoV-Net: Field-of-View Extrapolation Using Self-Attention and
Uncertainty [95.11806655550315]
We utilize information from a video sequence with a narrow field-of-view to infer the scene at a wider field-of-view.
We propose a temporally consistent field-of-view extrapolation framework, namely FoV-Net.
Experiments show that FoV-Net does not only extrapolate the temporally consistent wide field-of-view scene better than existing alternatives.
arXiv Detail & Related papers (2022-04-04T06:24:03Z) - Learning Cross-Scale Prediction for Efficient Neural Video Compression [30.051859347293856]
We present the first neural video that can compete with the latest coding standard H.266/VVC in terms of sRGB PSNR on UVG dataset for the low-latency mode.
We propose a novel cross-scale prediction module that achieves more effective motion compensation.
arXiv Detail & Related papers (2021-12-26T03:12:17Z) - Novel View Video Prediction Using a Dual Representation [51.58657840049716]
Given a set of input video clips from a single/multiple views, our network is able to predict the video from a novel view.
The proposed approach does not require any priors and is able to predict the video from wider angular distances, upto 45 degree.
A comparison with the State-of-the-art novel view video prediction methods shows an improvement of 26.1% in SSIM, 13.6% in PSNR, and 60% inFVD scores without using explicit priors from target views.
arXiv Detail & Related papers (2021-06-07T20:41:33Z) - Robust Unsupervised Video Anomaly Detection by Multi-Path Frame
Prediction [61.17654438176999]
We propose a novel and robust unsupervised video anomaly detection method by frame prediction with proper design.
Our proposed method obtains the frame-level AUROC score of 88.3% on the CUHK Avenue dataset.
arXiv Detail & Related papers (2020-11-05T11:34:12Z) - Deep Learning for Content-based Personalized Viewport Prediction of
360-Degree VR Videos [72.08072170033054]
In this paper, a deep learning network is introduced to leverage position data as well as video frame content to predict future head movement.
For optimizing data input into this neural network, data sample rate, reduced data, and long-period prediction length are also explored for this model.
arXiv Detail & Related papers (2020-03-01T07:31:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.