Viewport Prediction for Volumetric Video Streaming by Exploring Video Saliency and Trajectory Information
- URL: http://arxiv.org/abs/2311.16462v2
- Date: Fri, 28 Jun 2024 07:04:21 GMT
- Title: Viewport Prediction for Volumetric Video Streaming by Exploring Video Saliency and Trajectory Information
- Authors: Jie Li, Zhixin Li, Zhi Liu, Pengyuan Zhou, Richang Hong, Qiyue Li, Han Hu,
- Abstract summary: This paper presents and proposes a novel approach, named Saliency and Trajectory Viewport Prediction (STVP)
It aims to improve the precision of viewport prediction in volumetric video streaming.
In particular, we introduce a novel sampling method, Uniform Random Sampling (URS), to reduce computational complexity.
- Score: 45.31198546289057
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Volumetric video, also known as hologram video, is a novel medium that portrays natural content in Virtual Reality (VR), Augmented Reality (AR), and Mixed Reality (MR). It is expected to be the next-gen video technology and a prevalent use case for 5G and beyond wireless communication. Considering that each user typically only watches a section of the volumetric video, known as the viewport, it is essential to have precise viewport prediction for optimal performance. However, research on this topic is still in its infancy. In the end, this paper presents and proposes a novel approach, named Saliency and Trajectory Viewport Prediction (STVP), which aims to improve the precision of viewport prediction in volumetric video streaming. The STVP extensively utilizes video saliency information and viewport trajectory. To our knowledge, this is the first comprehensive study of viewport prediction in volumetric video streaming. In particular, we introduce a novel sampling method, Uniform Random Sampling (URS), to reduce computational complexity while still preserving video features in an efficient manner. Then we present a saliency detection technique that incorporates both spatial and temporal information for detecting static, dynamic geometric, and color salient regions. Finally, we intelligently fuse saliency and trajectory information to achieve more accurate viewport prediction. We conduct extensive simulations to evaluate the effectiveness of our proposed viewport prediction methods using state-of-the-art volumetric video sequences. The experimental results show the superiority of the proposed method over existing schemes. The dataset and source code will be publicly accessible after acceptance.
Related papers
- VisionTrap: Vision-Augmented Trajectory Prediction Guided by Textual Descriptions [10.748597086208145]
In this work, we propose a novel method that also incorporates visual input from surround-view cameras.
Our method achieves a latency of 53 ms, making it feasible for real-time processing.
Our experiments show that both the visual inputs and the textual descriptions contribute to improvements in trajectory prediction performance.
arXiv Detail & Related papers (2024-07-17T06:39:52Z) - Predicting Long-horizon Futures by Conditioning on Geometry and Time [49.86180975196375]
We explore the task of generating future sensor observations conditioned on the past.
We leverage the large-scale pretraining of image diffusion models which can handle multi-modality.
We create a benchmark for video prediction on a diverse set of videos spanning indoor and outdoor scenes.
arXiv Detail & Related papers (2024-04-17T16:56:31Z) - PastNet: Introducing Physical Inductive Biases for Spatio-temporal Video
Prediction [33.25800277291283]
We investigate the challenge of of-temporal video prediction, which involves generating future videos on historical data streams.
We introduce a novel approach called Spatio-temporal Network (PastNet) for generating high-quality predictions.
We employ a memory bank with the estimated intrinsic dimensionality to discretize local features during the processing of complex-temporal signals.
arXiv Detail & Related papers (2023-05-19T04:16:50Z) - Adaptive Multi-source Predictor for Zero-shot Video Object Segmentation [68.56443382421878]
We propose a novel adaptive multi-source predictor for zero-shot video object segmentation (ZVOS)
In the static object predictor, the RGB source is converted to depth and static saliency sources, simultaneously.
Experiments show that the proposed model outperforms the state-of-the-art methods on three challenging ZVOS benchmarks.
arXiv Detail & Related papers (2023-03-18T10:19:29Z) - Evaluating Foveated Video Quality Using Entropic Differencing [1.5877673959068452]
We propose a full reference (FR) foveated image quality assessment algorithm, which employs the natural scene statistics of bandpass responses.
We evaluate the proposed algorithm by measuring the correlations of the predictions that FED makes against human judgements.
The performance of the proposed algorithm yields state-of-the-art as compared with other existing full reference algorithms.
arXiv Detail & Related papers (2021-06-12T16:29:13Z) - Novel View Video Prediction Using a Dual Representation [51.58657840049716]
Given a set of input video clips from a single/multiple views, our network is able to predict the video from a novel view.
The proposed approach does not require any priors and is able to predict the video from wider angular distances, upto 45 degree.
A comparison with the State-of-the-art novel view video prediction methods shows an improvement of 26.1% in SSIM, 13.6% in PSNR, and 60% inFVD scores without using explicit priors from target views.
arXiv Detail & Related papers (2021-06-07T20:41:33Z) - DeepVideoMVS: Multi-View Stereo on Video with Recurrent Spatio-Temporal
Fusion [67.64047158294062]
We propose an online multi-view depth prediction approach on posed video streams.
The scene geometry information computed in the previous time steps is propagated to the current time step.
We outperform the existing state-of-the-art multi-view stereo methods on most of the evaluated metrics.
arXiv Detail & Related papers (2020-12-03T18:54:03Z) - Deep Learning for Content-based Personalized Viewport Prediction of
360-Degree VR Videos [72.08072170033054]
In this paper, a deep learning network is introduced to leverage position data as well as video frame content to predict future head movement.
For optimizing data input into this neural network, data sample rate, reduced data, and long-period prediction length are also explored for this model.
arXiv Detail & Related papers (2020-03-01T07:31:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.